Interval Selection in Sliding Windows

Cezar-Mihail Alexandru111Supported by EPSRC Doctoral Training Studentship EP/T517872/1.
School of Computer Science, University of Bristol, Bristol, UK
[email protected]
   Christian Konrad222Supported by EPSRC New Investigator Award EP/V010611/1.
School of Computer Science, University of Bristol, Bristol, UK
[email protected]
Abstract

We initiate the study of the Interval Selection problem in the (streaming) sliding window model of computation. In this problem, an algorithm receives a potentially infinite stream of intervals on the line, and the objective is to maintain at every moment an approximation to a largest possible subset of disjoint intervals among the L𝐿Litalic_L most recent intervals, for some integer L𝐿Litalic_L.

We give the following results:

  1. 1.

    In the unit-length intervals case, we give a 2222-approximation sliding window algorithm with space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), and we show that any sliding window algorithm that computes a (2ε)2𝜀(2-\varepsilon)( 2 - italic_ε )-approximation requires space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ), for any ε>0𝜀0\varepsilon>0italic_ε > 0.

  2. 2.

    In the arbitrary-length case, we give a (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation sliding window algorithm with space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), for any constant ε>0𝜀0\varepsilon>0italic_ε > 0, which constitutes our main result.333We use the notation O~(.)\tilde{O}(.)over~ start_ARG italic_O end_ARG ( . ) to mean O(.)O(.)italic_O ( . ) where polylogpolylog\mathop{\mathrm{polylog}}\nolimitsroman_polylog factors and dependencies on ε𝜀\varepsilonitalic_ε are suppressed. We also show that space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ) is needed for algorithms that compute a (2.5ε)2.5𝜀(2.5-\varepsilon)( 2.5 - italic_ε )-approximation, for any ε>0𝜀0\varepsilon>0italic_ε > 0.

Our main technical contribution is an improvement over the smooth histogram technique, which consists of running independent copies of a traditional streaming algorithm with different start times. By employing the one-pass 2222-approximation streaming algorithm by Cabello and Pérez-Lantero [Theor. Comput. Sci. ’17] for Interval Selection on arbitrary-length intervals as the underlying algorithm, the smooth histogram technique immediately yields a (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation in this setting. Our improvement is obtained by forwarding the structure of the intervals identified in a run to the subsequent run, which constrains the shape of an optimal solution and allows us to target optimal intervals differently.

1 Introduction

Sliding Window Model

The sliding window model of computation introduced by Datar et al. [9] captures many of the challenges that arise when processing infinite data streams. In this model, an algorithm receives an infinite stream of data items and is required to maintain, at every moment, a solution to a given problem on the current sliding window, i.e., on the L𝐿Litalic_L most recent data items, for an integer L𝐿Litalic_L. The objective is to design algorithms that use much less space than the size of the sliding window L𝐿Litalic_L.

Many modern data sources are best modelled as infinite data streams rather than as data sets of large but finite sizes. For example, the sequence of Tweets on X (formerly Twitter), the sequence of IP packages passing through a network router, and continuous sensor measurements for monitoring the physical world are a priori unending. Such data sets typically constitute time-series data, where the resulting data stream is ordered with respect to the data items’ creation times. When processing such streams, it is reasonable to focus on the most recent data items (as it is modelled in the sliding window model by the sliding window size L𝐿Litalic_L) since the near past usually affects the present more strongly than older data.

The sliding window model should be contrasted with the more traditional one-pass data streaming model. In the data streaming model, an algorithm processes a finite stream of n𝑛nitalic_n data items and is tasked with producing a single output once all items have been processed. Similar to the sliding window model, the objective is to design algorithms that use as little space as possible, in particular, sublinear in the length of the stream. Since sliding window algorithms with L=n𝐿𝑛L=nitalic_L = italic_n can immediately be used in the data streaming model, problems are generally harder to solve in the sliding window model.

Interval Selection Problem

In this work, we initiate the study of the Interval Selection problem in the sliding window model. Given a set 𝒮𝒮\mathcal{S}caligraphic_S of n𝑛nitalic_n intervals on the real line, the objective is to find a subset 𝒮𝒮\mathcal{I}\subseteq\mathcal{S}caligraphic_I ⊆ caligraphic_S of pairwise non-overlap** intervals of maximum cardinality. The problem can also be regarded as the Maximum Independent Set problem in the interval graph associated with the intervals 𝒮𝒮\mathcal{S}caligraphic_S. We consider both the unit-length case, where all intervals are of length 1111, and the arbitrary-length case, where no restriction on the lengths of the intervals is imposed.

Interval Selection is fully understood in the one-pass streaming model. Emek et al. [10] gave a 3232\frac{3}{2}divide start_ARG 3 end_ARG start_ARG 2 end_ARG-approximation streaming algorithm for unit-length intervals and a 2222-approximation streaming algorithm for arbitrary-length intervals. Both algorithms use space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ), where OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T denotes an optimal solution, assuming that the space required for storing an interval is O(1)𝑂1O(1)italic_O ( 1 ). Emek et al. also gave matching lower bounds, showing that, for both the unit-length and the arbitrary-length case, slightly better approximations require space Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ). Subsequently, Cabello and Pérez-Lantero [5] also gave algorithms for the unit-length and the arbitrary-length cases that match the guarantees of those by Emek et al. but are significantly simpler. We will reuse one of the algorithms by Cabello and Pérez-Lantero in this paper. Last, weighted intervals as well as the insertion-deletion setting, where previously inserted intervals can be deleted again, have also been considered [8, 2], where [2] addresses the challenge of outputting the size or weight of a largest/heaviest independent set rather than outputting the intervals themselves.

The Smooth Histogram Technique

Braverman and Ostrovsky [4] introduced the smooth histogram technique, which allows deriving sliding window algorithms from traditional streaming algorithms at the expense of slightly increased space requirements and approximation guarantees. The method works as follows. Given a streaming algorithm 𝒜𝒜\mathcal{A}caligraphic_A for a specific problem P that fulfills certain smoothness properties (see [4] for details), multiple copies of 𝒜𝒜\mathcal{A}caligraphic_A are run with different starting positions in the stream. The runs are such that consecutive runs differ only slightly in solution quality, and, thus, when a run expires due to the fact that its starting position fell out of the current sliding window, the subsequent run can be used to still yield an acceptable solution. The smooth histogram technique can be applied to the Interval Selection algorithms by Emek et al. [10] and by Cabello and Pérez-Lantero [5], and we immediately obtain sliding window algorithms for both the unit-length and the arbitrary-length cases using space O~(|OPT|)~𝑂𝑂𝑃𝑇\tilde{O}(|OPT|)over~ start_ARG italic_O end_ARG ( | italic_O italic_P italic_T | )444We use the notation O~(.)\tilde{O}(.)over~ start_ARG italic_O end_ARG ( . ) to mean O(.)O(.)italic_O ( . ) where polylogpolylog\mathop{\mathrm{polylog}}\nolimitsroman_polylog factors and dependencies on ε𝜀\varepsilonitalic_ε are suppressed.. For unit-length intervals, the resulting approximation factor is 3+ε3𝜀3+\varepsilon3 + italic_ε, for any ε>0𝜀0\varepsilon>0italic_ε > 0, and for arbitrary-length intervals, the approximation factor is 4+ε4𝜀4+\varepsilon4 + italic_ε, for any ε>0𝜀0\varepsilon>0italic_ε > 0. We will provide the analysis of the (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation for arbitrary-length intervals in this paper (Theorem 5) since it forms the basis of the analysis of one of our algorithms.

Our Results

In this work, we show that it is possible to improve upon the guarantees obtained from the smooth histogram technique. We give deterministic sliding window algorithms and lower bounds that also apply to randomized algorithms for Interval Selection for both the unit-length and arbitrary-length cases. Our algorithms use space O~(|OPT|)~𝑂𝑂𝑃𝑇\tilde{O}(|OPT|)over~ start_ARG italic_O end_ARG ( | italic_O italic_P italic_T | ) at any moment during the processing of the stream, where OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T denotes an optimal solution in the current sliding window. Observe that OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T may vary throughout the processing of the stream, and, thus, the space used by our algorithms may therefore also change accordingly.

Regarding unit-length intervals, we give a 2222-approximation sliding window algorithm using O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ) space, and we prove that any sliding window algorithm with an approximation guarantee of 2ε2𝜀2-\varepsilon2 - italic_ε, for any ε>0𝜀0\varepsilon>0italic_ε > 0, requires space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ). Recall that, in the streaming model, a 3232\frac{3}{2}divide start_ARG 3 end_ARG start_ARG 2 end_ARG-approximation can be achieved with space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ). Our lower bound thus establishes a separation between the sliding window and the streaming models for unit-length intervals.

In the arbitrary-length case, we give a (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation sliding window algorithm with space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), improving over the smooth histogram technique, which constitutes our main and most technical result. We also prove that any (52ε)52𝜀(\frac{5}{2}-\varepsilon)( divide start_ARG 5 end_ARG start_ARG 2 end_ARG - italic_ε )-approximation algorithm, for any ε>0𝜀0\varepsilon>0italic_ε > 0, requires space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ). Since, in the streaming model, a 2222-approximation can be achieved with space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ), our lower bound also establishes a separation between the sliding window and the streaming models in the arbitrary-length case.

We summarize and contrast our results with results from the streaming model in Figure 1.

Streaming model [10, 5] Sliding window model (this paper)
Algorithm LB Algorithm LB
Unit-length Intervals 3232\frac{3}{2}divide start_ARG 3 end_ARG start_ARG 2 end_ARG 32ε32𝜀\frac{3}{2}-\varepsilondivide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_ε 2222 (Thm 3) 2ε2𝜀2-\varepsilon2 - italic_ε (Thm 4)
Arbitrary-length Intervals 2222 2ε2𝜀2-\varepsilon2 - italic_ε 113113\frac{11}{3}divide start_ARG 11 end_ARG start_ARG 3 end_ARG (Thm 6) 52ε52𝜀\frac{5}{2}-\varepsilondivide start_ARG 5 end_ARG start_ARG 2 end_ARG - italic_ε (Thm 7)
Figure 1: Approximation factors achievable in the streaming and sliding window models. All algorithms use space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), while all lower bound results are to be interpreted in that achieving the stated approximation guarantee requires space Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) (streaming) or Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ) (sliding window model). The lower bound results hold for any ε>0𝜀0\varepsilon>0italic_ε > 0.
A Lack of Lower Bounds in the Sliding Window Model

Interestingly, to the best of our knowledge, for graph problems (recall that the interval selection problem is an independent set problem on interval graphs) no separation result between the one-pass streaming and the sliding window models are known. In particular, we are not aware of any space lower bounds for graph problems specifically designed for the sliding window setting, and the only lower bounds that apply are those that carry over from the one-pass streaming setting. Our work is thus the first to establish such a separation. While our results for arbitrary-length intervals are not tight, we stress that for most problems considered, including Maximum Matching and Minimum Vertex Cover, no tight bounds are known. It is unclear whether this is due to a lack of techniques for improved algorithms or for stronger lower bounds.

Techniques

We will first discuss the key ideas behind our results for unit-length intervals, and then discuss our results for arbitrary-length intervals.

Unit-length Intervals. Our algorithm for unit-length intervals is surprisingly simple yet optimal, as established by our lower bound result. For each integer r𝑟ritalic_r, maintain the latest interval within the current sliding window whose left endpoint lies in the interval [r,r+1)𝑟𝑟1[r,r+1)[ italic_r , italic_r + 1 ) if there is one. We argue that, if at any moment, the algorithm stores D𝐷Ditalic_D intervals, then we can extract an independent set of size at least D/2𝐷2D/2italic_D / 2 by considering either only the intervals [r,r+1)𝑟𝑟1[r,r+1)[ italic_r , italic_r + 1 ) where r𝑟ritalic_r is odd or where r𝑟ritalic_r is even, while OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T is bounded by D/2OPTD𝐷2𝑂𝑃𝑇𝐷D/2\leq OPT\leq Ditalic_D / 2 ≤ italic_O italic_P italic_T ≤ italic_D, which establishes both the approximation factor of 2222 and the space requirements. We note that the idea of considering either only the odd or even intervals for obtaining a 2222-approximation was previously used by [2].

Our lower bound for unit-length intervals is obtained by a reduction to the IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT problem in the one-way two-party communication setting. In this setting, there are two parties, denoted Alice and Bob. Each party holds a portion of the input data. Alice sends a single message to Bob, who then outputs the result of the computation. The objective is to solve a problem using a message of smallest possible size. In IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, Alice holds a bit-string X{0,1}n𝑋superscript01𝑛X\in\{0,1\}^{n}italic_X ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and Bob holds an index J[n]𝐽delimited-[]𝑛J\in[n]italic_J ∈ [ italic_n ], where [n]={1,2,3,,n}delimited-[]𝑛123𝑛[n]=\{1,2,3,...,n\}[ italic_n ] = { 1 , 2 , 3 , … , italic_n }, and the objective for Bob is to report the bit X[J]𝑋delimited-[]𝐽X[J]italic_X [ italic_J ]. It is well-known that a message of size Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) is needed to solve the problem.

We argue that a sliding window algorithm 𝒜𝒜\mathcal{A}caligraphic_A for Interval Selection on unit-length intervals with approximation guarantee slightly below 2222 can be used to solve IndexΘ(L)subscriptIndexΘ𝐿\textsf{Index}_{\Theta(L)}Index start_POSTSUBSCRIPT roman_Θ ( italic_L ) end_POSTSUBSCRIPT. To this end, Alice translates the bit-string X𝑋Xitalic_X into a clique gadget, i.e., a stack of overlap** Θ(L)Θ𝐿\Theta(L)roman_Θ ( italic_L ) interval slots that are slightly shifted from left-to-right, where interval i𝑖iitalic_i is present in the stack if and only if X[i]=1𝑋delimited-[]𝑖1X[i]=1italic_X [ italic_i ] = 1. Clique gadgets have been used in all previous space lower bound constructions for intervals [10, 2, 8]. Alice then runs 𝒜𝒜\mathcal{A}caligraphic_A on these intervals and sends the memory state of 𝒜𝒜\mathcal{A}caligraphic_A to Bob. Bob subsequently feeds an interval located slightly to the right of the slot of interval J𝐽Jitalic_J into the execution of 𝒜𝒜\mathcal{A}caligraphic_A such that Bob’s interval overlaps with all interval slots at positions J+1absent𝐽1\geq J+1≥ italic_J + 1 and does not overlap with all interval slots at positions Jabsent𝐽\leq J≤ italic_J. The key idea of this reduction is that, since 𝒜𝒜\mathcal{A}caligraphic_A is a sliding window algorithm, it must be able to report a valid solution even if any prefix of intervals of the stack are deleted/have expired. Consider thus the situation when the intervals that are located in the first J1𝐽1J-1italic_J - 1 slots have expired. Then, the resulting instance has an independent set of size 2222 if and only if X[J]=1𝑋delimited-[]𝐽1X[J]=1italic_X [ italic_J ] = 1, otherwise a largest independent set is of size 1111. Since the approximation factor of 𝒜𝒜\mathcal{A}caligraphic_A is below 2222, 𝒜𝒜\mathcal{A}caligraphic_A can thus distinguish between the two cases and solve IndexΘ(L)subscriptIndexΘ𝐿\textsf{Index}_{\Theta(L)}Index start_POSTSUBSCRIPT roman_Θ ( italic_L ) end_POSTSUBSCRIPT. Since Alice only sent the memory state of 𝒜𝒜\mathcal{A}caligraphic_A to Bob, we also obtain a space lower bound for 𝒜𝒜\mathcal{A}caligraphic_A. While this description covers the key idea of our lower bound, we note that our actual construction is slightly more involved due to an additional technical challenge. See proof of Theorem 4 for details.

Our lower bound construction shares similarities with the lower bounds by [2] and [10], as both of these lower bounds also work with clique gadgets and special intervals that render a specific interval in the clique gadget important. In [2], a reduction to the Augmented-Index problem is given in order to obtain a space lower bound for the dynamic streaming setting, where previously inserted intervals can be deleted again at any moment. In Augmented-Index, besides the index J𝐽Jitalic_J, Bob also holds the prefix X[1,,J1]𝑋1𝐽1X[1,\dots,J-1]italic_X [ 1 , … , italic_J - 1 ]. While in our setting, intervals are deleted due to the shifting sliding window, in their lower bound, intervals are explicitly deleted by Bob.

Arbitrary-length Intervals. Our algorithm and our lower bound for arbitrary-length intervals are substantially more involved, and our (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation algorithm constitutes the main technical result of this paper.

Our algorithm constitutes an improvement over the smooth histogram method. Using the one-pass 2222-approximation streaming algorithm for arbitrary-length intervals by Cabello and Pérez-Lantero [5], which we abbreviate by 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P, as the base algorithm of the smooth histogram method, we immediately obtain a (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation sliding window algorithm using O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ) space. The key idea of the method is to maintain various runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P with different starting times that are sufficiently spaced out so that only a logarithmic number of runs are needed, yet adjacent runs still have similar output sizes. Then, when a run expires, the subsequent run can still be used to report a good enough solution.

We observe that the executions of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P in the smooth histogram method are independent. Our key contribution that gives rise to our improvement is to forward the structure identified in a run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P to the subsequent run. The 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm, which we will discuss in detail in Section 4.1.1, maintains a partition of the real line that restrains the possible locations of optimal intervals that are yet to arrive in the stream. We target these locations individually in the subsequent run by initiating additional runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P on restricted domains where we expect to find many of these optimal intervals.

Our approach relies on a property of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm that, at first glance, seems relatively insignificant. As proved by Cabello and Pérez-Lantero, the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm produces a solution of size at least (|OPT|+1)/2𝑂𝑃𝑇12(|OPT|+1)/2( | italic_O italic_P italic_T | + 1 ) / 2, and thus only has an approximation factor of 2222 in an asymptotic sense. Consequently, if OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T is a small constant then the algorithm achieves an approximation factor strictly below 2222. We exploit this property in that we execute the additional runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P on small domains where we expect to find only a small constant number of optimal intervals, see Section 4.1.3 for further details.

Our (2.5ε)2.5𝜀(2.5-\varepsilon)( 2.5 - italic_ε )-approximation lower bound for arbitrary-length intervals is also achieved via a reduction to a hard problem in one-way communication complexity. However, instead of exploiting the hardness of the two-party problem Index as in the unit-lengths case, we use the three-party problem Chain3subscriptChain3\textsf{Chain}_{3}Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT introduced by Cormode et al. [6] instead. In Chain3subscriptChain3\textsf{Chain}_{3}Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, the first two parties and the last two parties hold separate Index instances (X1,J1),(X2,J2){0,1}n×[n]subscript𝑋1subscript𝐽1subscript𝑋2subscript𝐽2superscript01𝑛delimited-[]𝑛(X_{1},J_{1}),(X_{2},J_{2})\in\{0,1\}^{n}\times[n]( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × [ italic_n ] that are correlated in that they have the same answer bit, i.e., X1[J1]=X2[J2]=:xX_{1}[J_{1}]=X_{2}[J_{2}]=:xitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = : italic_x, and the objective for the third party is to determine the bit x𝑥xitalic_x. Chain3subscriptChain3\textsf{Chain}_{3}Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT also requires a message of size Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ) to be solved. Similar to the unit-length case, the first two parties introduce clique gadgets based on the bit-strings X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and X2subscript𝑋2X_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and the third party introduces additional crucial intervals. The strength of using Chain3subscriptChain3\textsf{Chain}_{3}Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is that, if the answer bit is zero, then the crucial intervals corresponding to X1[J1]subscript𝑋1delimited-[]subscript𝐽1X_{1}[J_{1}]italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] and X2[J2]subscript𝑋2delimited-[]subscript𝐽2X_{2}[J_{2}]italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] of all clique gadgets are missing, while if the answer bit is one then all of these intervals are present. The method thus allows us to work with multiple clique gadgets instead of only a single one, which we exploit to obtain a stronger lower bound. See Section 4.2 for details.

Further Related Work

Crouch et al. [7] initiated the study of graph problems in the sliding window model (recall that Interval Selection is an independent set problem on interval graphs). They showed that, similar to the streaming model, there exist sliding window algorithms that use space O~(n)~O𝑛\tilde{\mathrm{O}}(n)over~ start_ARG roman_O end_ARG ( italic_n ) for deciding Connectivity and Bipartiteness, where n𝑛nitalic_n is the number of vertices in the input graph. They also gave positive results for the computation of cut-sparsifiers, spanners and minimum spanning trees, and they initiated the study of the Maximum Matching problem in the sliding window model (see below).

The smooth histogram technique has been successfully applied for designing sliding window algorithms for graph problems, and the state-of-the-art sliding window algorithms for Maximum Matching and Minimum Vertex Cover rely on the smooth histogram technique.

For Maximum Matching, a 2222-approximation with space O~(n)~O𝑛\tilde{\mathrm{O}}(n)over~ start_ARG roman_O end_ARG ( italic_n ) can easily be achieved in the streaming model by running the Greedy matching algorithm, and the smooth histogram method immediately yields a (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation sliding window algorithm when built on Greedy. Crouch et al. [7] observed that the resulting algorithm can be analyzed more precisely and showed that it actually yields a (3+ε)3𝜀(3+\varepsilon)( 3 + italic_ε )-approximation sliding window algorithm. Regarding the weighted version of the Maximum Matching problem, the smooth histogram technique immediately yields a (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation using the (2+ε)2𝜀(2+\varepsilon)( 2 + italic_ε )-approximation streaming algorithm by [15], and, again, as proved by Biabani et al. [3], the analysis can be tailored to the Maximum Matching problem to establish an approximation factor of 3.5+ε3.5𝜀3.5+\varepsilon3.5 + italic_ε without changing the algorithm. Alexandru et al. [1] then improved the approximation factor to 3+ε3𝜀3+\varepsilon3 + italic_ε by running the smooth histogram algorithm with a slightly different objective function.

Regarding the Minimum Vertex Cover problem, a smooth histogram-based algorithm is known to yield an approximation factor of (3+ε)3𝜀(3+\varepsilon)( 3 + italic_ε ) [16], improving over previous work [13].

Outline

In Section 2, we give notation, provide some clarification on the sliding window model, and introduce hard communication problems that we rely on for proving our lower bound results. Then, in Section 3, we give our algorithm and lower bound for the case of unit-length intervals, and in Section 4, we give our algorithm and lower bound for arbitrary-length intervals. Finally, we conclude in Section 5 with open problems.

2 Preliminaries

For a set of intervals \mathcal{I}caligraphic_I, we denote by OPT()𝑂𝑃𝑇OPT(\mathcal{I})italic_O italic_P italic_T ( caligraphic_I ) an independent subset of \mathcal{I}caligraphic_I of maximum size. We also apply OPT(.)OPT(.)italic_O italic_P italic_T ( . ) to substreams of intervals and to data structures that store intervals.

Sliding Window Algorithms

Throughout the document, we denote by L𝐿Litalic_L the size of the sliding window, and we assume that L𝐿Litalic_L is large enough, i.e., larger than a suitably large constant. For two streams of intervals A,B𝐴𝐵A,Bitalic_A , italic_B we denote the stream that is obtained by concatenating A𝐴Aitalic_A and B𝐵Bitalic_B simply by AB𝐴𝐵ABitalic_A italic_B, i.e., we omit a concatenation symbol. Furthermore, for simplicity, we assume that the space required to store an interval is O(1)𝑂1O(1)italic_O ( 1 ). However, if instead k𝑘kitalic_k bits are accounted for storing an interval then the space complexities of our algorithms need to be multiplied by k𝑘kitalic_k.

Communication Complexity

As it is standard in the data streaming literature, our space lower bounds are proved via reductions to problems in the one-way communication setting. In this setting, multiple parties P1,P2,,Pksubscript𝑃1subscript𝑃2subscript𝑃𝑘P_{1},P_{2},\dots,P_{k}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT each hold a portion of the input data and communicate in order to solve a problem. Communication is one-way, i.e., party P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sends a message to P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, who in turn sends a message to P3subscript𝑃3P_{3}italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. This continues until party Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has received a message from party Pk1subscript𝑃𝑘1P_{k-1}italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT and then outputs the result of the computation. The parties can make use of public and private randomness and need to report a correct solution with probability 2/3232/32 / 3. We refer the reader to [14] for an introduction to communication complexity.

We will exploit the hardness of the two-party communication problem IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where we denote the first party by Alice and the second party by Bob, and the k𝑘kitalic_k-party communication problem ChainksubscriptChain𝑘\textsf{Chain}_{k}Chain start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, which was recently introduced by Cormode et al. [6].

IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT: Input: Alice holds a bit-string X{0,1}n𝑋superscript01𝑛X\in\{0,1\}^{n}italic_X ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and Bob holds an index J[n]𝐽delimited-[]𝑛J\in[n]italic_J ∈ [ italic_n ]. Output: Bob outputs X[J]𝑋delimited-[]𝐽X[J]italic_X [ italic_J ].

It is well-known that solving IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT requires Alice to send a message of size Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ).

Theorem 1 (e.g. [12]).

Every randomized constant-error one-way communication protocol for IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT requires a message of size Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ).

The problem Chaink(n)subscriptChain𝑘𝑛\textsf{Chain}_{k}(n)Chain start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) can be regarded as chaining together k1𝑘1k-1italic_k - 1 instances of IndexnsubscriptIndex𝑛\textsf{Index}_{n}Index start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where the instances are correlated in that they are guaranteed to have the same output.

Chain(n)k{}_{k}(n)start_FLOATSUBSCRIPT italic_k end_FLOATSUBSCRIPT ( italic_n ): Input: For 1ik11𝑖𝑘11\leq i\leq k-11 ≤ italic_i ≤ italic_k - 1, player Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT receives a bitvector Xi{0,1}nsubscript𝑋𝑖superscript01𝑛X_{i}\in\{0,1\}^{n}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Additionally, for any 2ik2𝑖𝑘2\leq i\leq k2 ≤ italic_i ≤ italic_k player Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT receives an index Ji1[n]subscript𝐽𝑖1delimited-[]𝑛J_{i-1}\in[n]italic_J start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∈ [ italic_n ]. The inputs are correlated such that X1[J1]=X2[J2]==Xk1[Jk1]=x{0,1}.subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽2subscript𝑋𝑘1delimited-[]subscript𝐽𝑘1𝑥01X_{1}[J_{1}]=X_{2}[J_{2}]=\dots=X_{k-1}[J_{k-1}]=x\in\{0,1\}\ .italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = ⋯ = italic_X start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = italic_x ∈ { 0 , 1 } . Output: Player Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT outputs x𝑥xitalic_x.

Sundaresan [17] recently settled the communication complexity of Chaink(n)subscriptChain𝑘𝑛\textsf{Chain}_{k}(n)Chain start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ), improving over the previous lower bounds by Cormode et al. [6] and Feldman et al. [11]:

Theorem 2 ([17]).

Every constant-error one-way communication protocol that solves Chaink(n)subscriptChain𝑘𝑛\textsf{Chain}_{k}(n)Chain start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) requires at least one message of size Ω(n/k)Ω𝑛𝑘\Omega(n/k)roman_Ω ( italic_n / italic_k ).

3 Unit-length Intervals

In this section, we give our sliding window algorithm (Subsection 3.1) and our lower bound (Subsection 3.2) for unit-length intervals.

3.1 Sliding Window Algorithm for Unit-length Intervals

We now describe our algorithm for unit-length intervals.

Algorithm 1 Sliding window algorithm for Interval Selection on unit-length intervals

Input: Stream S𝑆Sitalic_S of unit-length intervals, window length L𝐿Litalic_L   Initialization:

1:latestlatest\texttt{latest}\leftarrow\emptysetlatest ← ∅ the indexed set of stored intervals

  Streaming:

1:while an interval I=[r,r+1]𝐼𝑟𝑟1I=[r,r+1]italic_I = [ italic_r , italic_r + 1 ] is revealed, for some real number r𝑟ritalic_r do
2:     latest(r)Ilatest𝑟𝐼\texttt{latest}(\lfloor r\rfloor)\leftarrow Ilatest ( ⌊ italic_r ⌋ ) ← italic_I
3:     if J=[r,r+1]latestsuperscript𝐽superscript𝑟superscript𝑟1latest\exists\ J^{\prime}=[r^{\prime},r^{\prime}+1]\in\texttt{latest}∃ italic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 ] ∈ latest that has expired then
4:         latest(r)latestsuperscript𝑟\texttt{latest}(\lfloor r^{\prime}\rfloor)\leftarrow\emptysetlatest ( ⌊ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⌋ ) ← ∅      

  Post-processing:

1:Return OPT(latest)𝑂𝑃𝑇latestOPT(\texttt{latest})italic_O italic_P italic_T ( latest )

Our algorithm is simple: For each integer r𝑟ritalic_r, the algorithm maintains in latest(r)latest𝑟\texttt{latest}(r)latest ( italic_r ) the latest interval of the current sliding window with its left boundary in [r,r+1)𝑟𝑟1[r,r+1)[ italic_r , italic_r + 1 ). The key observation, which was also used in [2], is that the intervals {latest(r):r even}conditional-setlatest𝑟𝑟 even\{\texttt{latest}(r)\ :\ r\text{ even}\}{ latest ( italic_r ) : italic_r even } and {latest(r):r odd}conditional-setlatest𝑟𝑟 odd\{\texttt{latest}(r)\ :\ r\text{ odd}\}{ latest ( italic_r ) : italic_r odd } form independent sets, and one of these sets constitutes a 2222-approximation.

Theorem 3.

Algorithm 1 is a deterministic 2222-approximation sliding window algorithm for Interval Selection on unit-length intervals that, at any moment, uses O(|OPT|)O𝑂𝑃𝑇\mathrm{O}(|OPT|)roman_O ( | italic_O italic_P italic_T | ) space, where OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T is a maximum independent set of intervals in the current sliding window.

Proof.

We will first prove that Algorithm 1 indeed computes a 2222-approximation, and then argue that the algorithm satisfies the memory requirements.

We call a unit-length interval I𝐼Iitalic_I active if it is included in the current sliding window (one of the L𝐿Litalic_L most recent intervals of the stream). Otherwise, we say that I𝐼Iitalic_I is expired.

Let OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T be a maximum independent set in the current sliding window and let ALG𝐴𝐿𝐺ALGitalic_A italic_L italic_G be the independent set reported by Algorithm 1. Define the indexed set latest as in the algorithm.

Approximation

We will show that

|OPT||latest|2|ALG|𝑂𝑃𝑇latest2𝐴𝐿𝐺\displaystyle|OPT|\leq|\texttt{latest}|\leq 2\cdot|ALG|| italic_O italic_P italic_T | ≤ | latest | ≤ 2 ⋅ | italic_A italic_L italic_G | (1)

holds, which then establishes the approximation factor of the sliding window algorithm.

First, we will prove |OPT||latest|𝑂𝑃𝑇latest|OPT|\leq|\texttt{latest}|| italic_O italic_P italic_T | ≤ | latest | holds. To this end, we will show that the function f:OPTlatest:𝑓𝑂𝑃𝑇latestf:OPT\to\texttt{latest}italic_f : italic_O italic_P italic_T → latest defined as f([x,x+1])=latest(x)𝑓𝑥𝑥1𝑙𝑎𝑡𝑒𝑠𝑡𝑥f([x,x+1])=latest(\lfloor x\rfloor)italic_f ( [ italic_x , italic_x + 1 ] ) = italic_l italic_a italic_t italic_e italic_s italic_t ( ⌊ italic_x ⌋ ) is injective.

We will first argue that f𝑓fitalic_f is well-defined in that latest(x)latest𝑥\texttt{latest}(\lfloor x\rfloor)latest ( ⌊ italic_x ⌋ ) exists, for every [x,x+1]OPT𝑥𝑥1𝑂𝑃𝑇[x,x+1]\in OPT[ italic_x , italic_x + 1 ] ∈ italic_O italic_P italic_T. Indeed, by inspecting the algorithm, when I:=[x,x+1]OPTassign𝐼𝑥𝑥1𝑂𝑃𝑇I:=[x,x+1]\in OPTitalic_I := [ italic_x , italic_x + 1 ] ∈ italic_O italic_P italic_T arrives in the stream, latest(x)latest𝑥\texttt{latest}(\lfloor x\rfloor)latest ( ⌊ italic_x ⌋ ) is set to I𝐼Iitalic_I, and, in particular, while I𝐼Iitalic_I is active, latest(x)latest𝑥\texttt{latest}(\lfloor x\rfloor)latest ( ⌊ italic_x ⌋ ) is never set to \emptyset. It may, however, happen that it is replaced with an interval which appeared after I𝐼Iitalic_I. In both cases, f𝑓fitalic_f is well-defined.

To see that f𝑓fitalic_f is injective, observe that for any two intervals in OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T, since these intervals are independent and of unit-length, the integer parts of their left endpoints are distinct. Hence, f(I1)f(I2)𝑓subscript𝐼1𝑓subscript𝐼2f(I_{1})\neq f(I_{2})italic_f ( italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≠ italic_f ( italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), for any two distinct intervals I1,I2OPTsubscript𝐼1subscript𝐼2𝑂𝑃𝑇I_{1},I_{2}\in OPTitalic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_O italic_P italic_T.

Since f𝑓fitalic_f is well-defined and injective, we obtain that |OPT||latest|𝑂𝑃𝑇latest|OPT|\leq|\texttt{latest}|| italic_O italic_P italic_T | ≤ | latest |, which thus proves the first inequality of Inequality 1. It remains to prove the second, i.e., that |latest|2|ALG|latest2𝐴𝐿𝐺|\texttt{latest}|\leq 2\cdot|ALG|| latest | ≤ 2 ⋅ | italic_A italic_L italic_G | also holds.

To see this, observe that, for two integers xy𝑥𝑦x\neq yitalic_x ≠ italic_y of the same parity, latest(x)latest𝑥\texttt{latest}(x)latest ( italic_x ) and latest(y)latest𝑦\texttt{latest}(y)latest ( italic_y ) (if they exist) are independent. This is because |yx|2𝑦𝑥2|y-x|\geq 2| italic_y - italic_x | ≥ 2 and the intervals have unit-length. By the pigeonhole principle, there are at least |latest|2latest2\frac{|\texttt{latest}|}{2}divide start_ARG | latest | end_ARG start_ARG 2 end_ARG intervals where their indices inside latest have the same parity, which implies that |ALG||latest|2𝐴𝐿𝐺latest2|ALG|\geq\frac{|\texttt{latest}|}{2}| italic_A italic_L italic_G | ≥ divide start_ARG | latest | end_ARG start_ARG 2 end_ARG.

Space

The algorithm stores |latest|latest|\texttt{latest}|| latest | intervals in the current sliding window. Then, as proved above, we have |latest|2|ALG|2|OPT|latest2𝐴𝐿𝐺2𝑂𝑃𝑇|\texttt{latest}|\leq 2|ALG|\leq 2|OPT|| latest | ≤ 2 | italic_A italic_L italic_G | ≤ 2 | italic_O italic_P italic_T |, which implies that the space used by the algorithm is O(|OPT|)O𝑂𝑃𝑇\mathrm{O}(|OPT|)roman_O ( | italic_O italic_P italic_T | ).

3.2 Space Lower Bound

We now show that sliding window algorithms that use space o(L)𝑜𝐿o(L)italic_o ( italic_L ) cannot compute a (2ε)2𝜀(2-\varepsilon)( 2 - italic_ε )-approximation to Interval Selection on unit-length intervals, for any ε>0𝜀0\varepsilon>0italic_ε > 0. Recall that, in the streaming model, a 3232\frac{3}{2}divide start_ARG 3 end_ARG start_ARG 2 end_ARG-approximation can be computed with space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ).

Theorem 4.

Let ε>0𝜀0\varepsilon>0italic_ε > 0 be any small constant. Then, any algorithm in the sliding window model that computes a (2ε)2𝜀(2-\varepsilon)( 2 - italic_ε )-approximate solution to Interval Selection on unit-length intervals with probability at least 2/3232/32 / 3 requires a memory of size Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ).

Proof.

Let 𝒜𝒜\mathcal{A}caligraphic_A be a sliding window algorithm for Interval Selection on unit-length intervals with approximation factor 2ε2𝜀2-\varepsilon2 - italic_ε, for some ε>0𝜀0\varepsilon>0italic_ε > 0.

We will show how 𝒜𝒜\mathcal{A}caligraphic_A can be used in order to obtain a communication protocol for IndexL2subscriptIndex𝐿2\textsf{Index}_{L-2}Index start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT.

To this end, let (X,J){0,1}L2×[L2]𝑋𝐽superscript01𝐿2delimited-[]𝐿2(X,J)\in\{0,1\}^{L-2}\times[L-2]( italic_X , italic_J ) ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_L - 2 end_POSTSUPERSCRIPT × [ italic_L - 2 ] be Alice and Bob’s input to IndexL2subscriptIndex𝐿2\textsf{Index}_{L-2}Index start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT. The two players proceed as follows:

  • Alice: Alice feeds the intervals I1,I2,,IL2subscript𝐼1subscript𝐼2subscript𝐼𝐿2I_{1},I_{2},\dots,I_{L-2}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_I start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT into 𝒜𝒜\mathcal{A}caligraphic_A (in the given order), where

    Ii={[i2L1,1+i2L1],if X[i]=1,[1iL2,2iL2],if X[i]=0.subscript𝐼𝑖cases𝑖2𝐿11𝑖2𝐿1if 𝑋delimited-[]𝑖11𝑖superscript𝐿22𝑖superscript𝐿2if 𝑋delimited-[]𝑖0\displaystyle I_{i}=\begin{cases}[\frac{i}{2L-1},1+\frac{i}{2L-1}],&\text{if }% X[i]=1\ ,\\ [1-\frac{i}{L^{2}},2-\frac{i}{L^{2}}],&\text{if }X[i]=0\ .\end{cases}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL [ divide start_ARG italic_i end_ARG start_ARG 2 italic_L - 1 end_ARG , 1 + divide start_ARG italic_i end_ARG start_ARG 2 italic_L - 1 end_ARG ] , end_CELL start_CELL if italic_X [ italic_i ] = 1 , end_CELL end_ROW start_ROW start_CELL [ 1 - divide start_ARG italic_i end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 2 - divide start_ARG italic_i end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] , end_CELL start_CELL if italic_X [ italic_i ] = 0 . end_CELL end_ROW

    Alice then sends the memory state of 𝒜𝒜\mathcal{A}caligraphic_A to Bob.

  • Bob: Using Alice’s message, Bob continues the execution of 𝒜𝒜\mathcal{A}caligraphic_A and feeds the interval

    IL1=[1+J2L1+1(2L1)2,2+J2L1+1(2L1)2]subscript𝐼𝐿11𝐽2𝐿11superscript2𝐿122𝐽2𝐿11superscript2𝐿12I_{L-1}=\left[1+\frac{J}{2L-1}+\frac{1}{(2L-1)^{2}},2+\frac{J}{2L-1}+\frac{1}{% (2L-1)^{2}}\right]italic_I start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT = [ 1 + divide start_ARG italic_J end_ARG start_ARG 2 italic_L - 1 end_ARG + divide start_ARG 1 end_ARG start_ARG ( 2 italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 2 + divide start_ARG italic_J end_ARG start_ARG 2 italic_L - 1 end_ARG + divide start_ARG 1 end_ARG start_ARG ( 2 italic_L - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ]

    into 𝒜𝒜\mathcal{A}caligraphic_A. Bob also adds the intervals Ii=[i2L1,1+i2L1]subscript𝐼𝑖𝑖2𝐿11𝑖2𝐿1I_{i}=[\frac{i}{2L-1},1+\frac{i}{2L-1}]italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ divide start_ARG italic_i end_ARG start_ARG 2 italic_L - 1 end_ARG , 1 + divide start_ARG italic_i end_ARG start_ARG 2 italic_L - 1 end_ARG ] to 𝒜𝒜\mathcal{A}caligraphic_A, for LiL+J1𝐿𝑖𝐿𝐽1L\leq i\leq L+J-1italic_L ≤ italic_i ≤ italic_L + italic_J - 1 in order to make the sliding window of the algorithm 𝒜𝒜\mathcal{A}caligraphic_A advance. Bob computes 𝒜𝒜\mathcal{A}caligraphic_A’s output in the latest sliding window consisting of the intervals defined as 𝒮={Ii|JiL+J1}𝒮conditional-setsubscript𝐼𝑖𝐽𝑖𝐿𝐽1\mathcal{S}=\{I_{i}|J\leq i\leq L+J-1\}caligraphic_S = { italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_J ≤ italic_i ≤ italic_L + italic_J - 1 }.

This construction is illustrated in Figure 2.

I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTI2subscript𝐼2I_{2}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTIJsubscript𝐼𝐽I_{J}italic_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPTIL1subscript𝐼𝐿1I_{L-1}italic_I start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPTIJ+1subscript𝐼𝐽1I_{J+1}italic_I start_POSTSUBSCRIPT italic_J + 1 end_POSTSUBSCRIPTIL2subscript𝐼𝐿2I_{L-2}italic_I start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPTILsubscript𝐼𝐿I_{L}italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPTIL+J2subscript𝐼𝐿𝐽2I_{L+J-2}italic_I start_POSTSUBSCRIPT italic_L + italic_J - 2 end_POSTSUBSCRIPTIL+J1subscript𝐼𝐿𝐽1I_{L+J-1}italic_I start_POSTSUBSCRIPT italic_L + italic_J - 1 end_POSTSUBSCRIPT
Figure 2: This figure illustrates the instances created by Alice and Bob in the proof of Theorem 4 for an instance of IndexL2subscriptIndex𝐿2\textsf{Index}_{L-2}Index start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT with X[J]=1𝑋delimited-[]𝐽1X[J]=1italic_X [ italic_J ] = 1. The dashed intervals on the upper part correspond to the zero elements of the bitvector X𝑋Xitalic_X. The red intervals I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, I2subscript𝐼2I_{2}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT correspond to expired intervals. IJsubscript𝐼𝐽I_{J}italic_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT is the only non-expired interval disjoint with the special interval IL1subscript𝐼𝐿1I_{L-1}italic_I start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT. Since X[J]=1𝑋delimited-[]𝐽1X[J]=1italic_X [ italic_J ] = 1, the optimal solution is of size 2222. If X[J]𝑋delimited-[]𝐽X[J]italic_X [ italic_J ] was equal to 00 then the interval IJsubscript𝐼𝐽I_{J}italic_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT would not be disjoint with IL1subscript𝐼𝐿1I_{L-1}italic_I start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT, and, thus, an optimal solution would be of size 1111.

We observe that if X[J]=1𝑋delimited-[]𝐽1X[J]=1italic_X [ italic_J ] = 1 then |OPT(𝒮)|=|{IJ,IL1}|=2𝑂𝑃𝑇𝒮subscript𝐼𝐽subscript𝐼𝐿12|OPT(\mathcal{S})|=|\{I_{J},I_{L-1}\}|=2| italic_O italic_P italic_T ( caligraphic_S ) | = | { italic_I start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT , italic_I start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT } | = 2, while if X[J]=0𝑋delimited-[]𝐽0X[J]=0italic_X [ italic_J ] = 0 then |OPT(𝒮)|=1𝑂𝑃𝑇𝒮1|OPT(\mathcal{S})|=1| italic_O italic_P italic_T ( caligraphic_S ) | = 1. Since 𝒜𝒜\mathcal{A}caligraphic_A has an approximation factor of 2ε2𝜀2-\varepsilon2 - italic_ε, 𝒜𝒜\mathcal{A}caligraphic_A needs to report the unique solution of size 2222 if X[J]=1𝑋delimited-[]𝐽1X[J]=1italic_X [ italic_J ] = 1, and a solution of size 1111 when X[J]=0𝑋delimited-[]𝐽0X[J]=0italic_X [ italic_J ] = 0. Bob can thus distinguish between the two cases and solve IndexL2subscriptIndex𝐿2\textsf{Index}_{L-2}Index start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT.

Since the protocol solves IndexL2subscriptIndex𝐿2\textsf{Index}_{L-2}Index start_POSTSUBSCRIPT italic_L - 2 end_POSTSUBSCRIPT, by Theorem 1, the protocol must use a message of size Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ). The protocol’s message is 𝒜𝒜\mathcal{A}caligraphic_A’s memory state, and, hence, 𝒜𝒜\mathcal{A}caligraphic_A must use space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ).

4 Arbitrary-length Intervals

In this section, we give our (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation sliding window algorithm and our (52ε)52𝜀(\frac{5}{2}-\varepsilon)( divide start_ARG 5 end_ARG start_ARG 2 end_ARG - italic_ε )-approximation lower bound for Interval Selection on arbitrary-length intervals.

4.1 (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation Sliding Window Algorithm

Our algorithm is obtained by running multiple instances of the Cabello and Pérez-Lantero streaming algorithm for Interval Selection on arbitrary-length intervals [5]. In the following, we abbreviate the algorithm by 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P. Since we employ various properties of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm, we discuss the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm in Subsection 4.1.1. We use the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm in the context of the smooth histogram technique, which we discuss in Subsection 4.1.2. Finally, we give our sliding window algorithm and its analysis in Subsection 4.1.3.

4.1.1 Cabello and Pérez-Lantero Algorithm

For an interval I=[a,b]𝐼𝑎𝑏I=[a,b]italic_I = [ italic_a , italic_b ], we define left(I)=aleft𝐼𝑎\text{left}(I)=aleft ( italic_I ) = italic_a and right(I)=bright𝐼𝑏\text{right}(I)=bright ( italic_I ) = italic_b.

The Cabello and Pérez-Lantero algorithm is depicted in Algorithm 2.

The listing of the algorithm uses the auxiliary functions left(.)\text{left}(.)left ( . ) and right(.)\text{right}(.)right ( . ), which return the left and right delimiters of an interval, respectively.

Algorithm 2 Cabello and Pérez-Lantero Algorithm (𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P)

Input: A stream S𝑆Sitalic_S of intervals   Initialization:

1:{}\mathcal{R}\leftarrow\{\mathbb{R}\}caligraphic_R ← { blackboard_R } a region partition
2:leftmost()=rightmost()=leftmostrightmost\text{leftmost}(\mathbb{R})=\text{rightmost}(\mathbb{R})=\mathbb{R}leftmost ( blackboard_R ) = rightmost ( blackboard_R ) = blackboard_R

  Streaming:

1:while an interval I𝐼Iitalic_I is revealed do
2:     if there exists R=[a,b)𝑅𝑎𝑏R=[a,b)\in\mathcal{R}italic_R = [ italic_a , italic_b ) ∈ caligraphic_R such that IR𝐼𝑅I\subseteq Ritalic_I ⊆ italic_R then
3:         if Ileftmost(R)rightmost(R)𝐼leftmost𝑅rightmost𝑅I\cap\text{leftmost}(R)\cap\text{rightmost}(R)\neq\emptysetitalic_I ∩ leftmost ( italic_R ) ∩ rightmost ( italic_R ) ≠ ∅ then
4:              if right(I)right(leftmost(R))right𝐼rightleftmost𝑅\text{right}(I)\leq\text{right}(\text{leftmost}(R))right ( italic_I ) ≤ right ( leftmost ( italic_R ) ) then leftmost(R)I𝑅𝐼(R)\leftarrow I( italic_R ) ← italic_I
5:              if left(I)left(rightmost(R))left𝐼leftrightmost𝑅\text{left}(I)\geq\text{left}(\text{rightmost}(R))left ( italic_I ) ≥ left ( rightmost ( italic_R ) ) then rightmost(R)I𝑅𝐼(R)\leftarrow I( italic_R ) ← italic_I
6:         else
7:              if right(I)left(rightmost(R))right𝐼leftrightmost𝑅\text{right}(I)\leq\text{left}(\text{rightmost}(R))right ( italic_I ) ≤ left ( rightmost ( italic_R ) ) then
8:                  Let R1(a,right(I)]subscript𝑅1𝑎right𝐼R_{1}\leftarrow(a,\text{right}(I)]italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← ( italic_a , right ( italic_I ) ]
9:                  Let R2(right(I)),b]R_{2}\leftarrow(\text{right}(I)),b]italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← ( right ( italic_I ) ) , italic_b ]
10:                  leftmost(R1)Isubscript𝑅1𝐼(R_{1})\leftarrow I( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ← italic_I, rightmost(R1)Isubscript𝑅1𝐼(R_{1})\leftarrow I( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ← italic_I
11:                  leftmost(R2)rightmost(R)subscript𝑅2rightmost𝑅(R_{2})\leftarrow\text{rightmost}(R)( italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ← rightmost ( italic_R ), rightmost(R2)rightmost(R)subscript𝑅2rightmost𝑅(R_{2})\leftarrow\text{rightmost}(R)( italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ← rightmost ( italic_R )
12:              else
13:                  Let R1[a,left(I))subscript𝑅1𝑎left𝐼R_{1}\leftarrow[a,\text{left}(I))italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← [ italic_a , left ( italic_I ) )
14:                  Let R2[left(I)),b)R_{2}\leftarrow[\text{left}(I)),b)italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← [ left ( italic_I ) ) , italic_b )
15:                  leftmost(R2)Isubscript𝑅2𝐼(R_{2})\leftarrow I( italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ← italic_I, rightmost(R2)Isubscript𝑅2𝐼(R_{2})\leftarrow I( italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ← italic_I
16:                  leftmost(R1)leftmost(R)subscript𝑅1leftmost𝑅(R_{1})\leftarrow\text{leftmost}(R)( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ← leftmost ( italic_R ),rightmost(R1)leftmost(R)subscript𝑅1leftmost𝑅(R_{1})\leftarrow\text{leftmost}(R)( italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ← leftmost ( italic_R )               
17:              Insert R1,R2subscript𝑅1subscript𝑅2R_{1},R_{2}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT into \mathcal{R}caligraphic_R
18:              Remove R𝑅Ritalic_R from \mathcal{R}caligraphic_R               

  Post-processing:

1:Return {leftmost(R)|R}conditional-setleftmost𝑅𝑅\{\text{leftmost}(R)|R\in\mathcal{R}\}{ leftmost ( italic_R ) | italic_R ∈ caligraphic_R }

The key idea behind the algorithm is to maintain a partition \mathcal{R}caligraphic_R of the real line \mathbb{R}blackboard_R that we refer to as a region partition. Initially, the algorithm starts with the single region ={}\mathcal{R}=\{\mathbb{R}\}caligraphic_R = { blackboard_R }, and as the algorithm proceeds, the real line is partitioned into half-open intervals. This is achieved as follows. Arriving intervals that cross a region boundary are ignored. Consider thus an arriving interval I𝐼Iitalic_I that lies entirely within a region. In each region, the algorithm stores the left-most (the interval with the left-most right delimiter) and right-most (the interval with the right-most left delimiter) intervals within the region that it has observed thus far. If the interval I𝐼Iitalic_I together with either the left-most or the right-most interval of the region forms an independent set of size two then the region is split into two regions and the left-most and right-most intervals are updated accordingly. Otherwise, if I𝐼Iitalic_I intersects with both the left-most and right-most intervals of the region then I𝐼Iitalic_I is only used to potentially replace the left-most and/or right-most intervals of the region.

Some key properties of the algorithm that we will reuse in this work are summarized in Figure 3 (see [5] for proofs).

C1 For each reigon R𝑅R\in\mathcal{R}italic_R ∈ caligraphic_R, the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm stores at least one (and at most two) intervals and the input instance is such that there are no two disjoint intervals that lie within region R𝑅Ritalic_R. C2 The 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm outputs a solution of size |||\mathcal{R}|| caligraphic_R |, i.e., one interval per region. Furthermore, we have that |||OPT|+12𝑂𝑃𝑇12|\mathcal{R}|\geq\frac{|OPT|+1}{2}| caligraphic_R | ≥ divide start_ARG | italic_O italic_P italic_T | + 1 end_ARG start_ARG 2 end_ARG, i.e., the algorithm has an approximation factor slightly better than 2222. C3 The algorithm uses space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ).

Figure 3: Key Properties of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P Algorithm.

Besides these properties, we require another property that allows us to employ the algorithm in the context of the smooth histogram technique:

Lemma 1.

The 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm is monotonic, i.e., for any two streams of intervals A,B𝐴𝐵A,Bitalic_A , italic_B we have that

|𝒞𝒫(A)||𝒞𝒫(AB)|.𝒞𝒫𝐴𝒞𝒫𝐴𝐵|\mathcal{CP}(A)|\leq|\mathcal{CP}(AB)|\ .| caligraphic_C caligraphic_P ( italic_A ) | ≤ | caligraphic_C caligraphic_P ( italic_A italic_B ) | .
Proof.

The output produced by the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm consists of one interval per region. The lemma then follows since, by construction, the number of regions cannot decrease. ∎

4.1.2 The Smooth Histogram Technique

The 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm can be employed in the context of the smooth histogram method to yield a (4+ε)4𝜀(4+\varepsilon)( 4 + italic_ε )-approximation sliding window algorithm for Interval Selection for arbitrary-length intervals that uses space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ). This is achieved as follows (see Algorithm 3):

1:while an Interval I𝐼Iitalic_I is revealed do
2:     Create new instance of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm
3:     Feed I𝐼Iitalic_I into all 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P instances that are currently running
4:     Clean-up:
5:       Remove oldest run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P if it has expired
6:       Denote by 𝒞𝒫1,𝒞𝒫2,𝒞subscript𝒫1𝒞subscript𝒫2\mathcal{CP}_{1},\mathcal{CP}_{2},\dotscaligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_C caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … the runs sorted in increasing order with respect to their
7:          starting positions. Then, repeatedly remove a run 𝒞𝒫i𝒞subscript𝒫𝑖\mathcal{CP}_{i}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if maintained solutions of
8:          the adjacent runs 𝒞𝒫i1𝒞subscript𝒫𝑖1\mathcal{CP}_{i-1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT and 𝒞𝒫i+1𝒞subscript𝒫𝑖1\mathcal{CP}_{i+1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT are within a factor of 1+ε1𝜀1+\varepsilon1 + italic_ε in size
9:     output solution of oldest run
Algorithm 3 Smooth Histogram Technique applied to the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm

Upon the arrival of a new interval, Algorithm 3 first creates a new run of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm and feeds the new interval into all currently running copies of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P. The method relies on a clever way of deleting unnecessary runs: A run is deleted if it is expired (i.e it contains an interval which appeared before the start of the current region) and if the closest run with earlier start time and the closest run with later start time are such that their solutions differ in size by less than a 1+ε1𝜀1+\varepsilon1 + italic_ε factor. Consider the moment after a clean-up took place, and let us denote the stored runs by 𝒞𝒫1,,𝒞𝒫𝒞subscript𝒫1𝒞subscript𝒫\mathcal{CP}_{1},\dots,\mathcal{CP}_{\ell}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_C caligraphic_P start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. The clean-up rule implies that the stored runs have the properties depicted in Figure 4.

S1 For every i2𝑖2i\leq\ell-2italic_i ≤ roman_ℓ - 2: |𝒞𝒫i|(1+ε)|𝒞𝒫i+2|𝒞subscript𝒫𝑖1𝜀𝒞subscript𝒫𝑖2|\mathcal{CP}_{i}|\geq(1+\varepsilon)|\mathcal{CP}_{i+2}|| caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≥ ( 1 + italic_ε ) | caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT |, and S2 Either |𝒞𝒫i|(1+ε)|𝒞𝒫i+1|𝒞subscript𝒫𝑖1𝜀𝒞subscript𝒫𝑖1|\mathcal{CP}_{i}|\leq(1+\varepsilon)|\mathcal{CP}_{i+1}|| caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ ( 1 + italic_ε ) | caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | holds, or, if |𝒞𝒫i|(1+ε)|𝒞𝒫i+1|𝒞subscript𝒫𝑖1𝜀𝒞subscript𝒫𝑖1|\mathcal{CP}_{i}|\geq(1+\varepsilon)|\mathcal{CP}_{i+1}|| caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≥ ( 1 + italic_ε ) | caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | then the starting positions of run i𝑖iitalic_i and i+1𝑖1i+1italic_i + 1 differ by only a single interval.

Figure 4: Key Properties of the Smooth Histogram Technique.

Property S1 implies that there are at most O(log1+ε(L))𝑂subscript1𝜀𝐿O(\log_{1+\varepsilon}(L))italic_O ( roman_log start_POSTSUBSCRIPT 1 + italic_ε end_POSTSUBSCRIPT ( italic_L ) ) active runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P and thus the space of Algorithm 3 is at most a factor O(log1+ε(L))𝑂subscript1𝜀𝐿O(\log_{1+\varepsilon}(L))italic_O ( roman_log start_POSTSUBSCRIPT 1 + italic_ε end_POSTSUBSCRIPT ( italic_L ) ) larger than the space used by 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P.

Property S2 implies that either consecutive runs differ by at most a 1+ε1𝜀1+\varepsilon1 + italic_ε factor in solution size or are such that their starting times differ by only a single interval.

We now provide a proof that allows us to see that Algorithm 3 is a (4+2ε)42𝜀(4+2\cdot\varepsilon)( 4 + 2 ⋅ italic_ε )-approximation algorithm for Interval Selection for arbitrary-length intervals. This proof will establish insight into how the analysis of our more involved (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation algorithm is conducted and thus serves as a warm-up.

Theorem 5.

Algorithm 3 is a (4+2ε)42𝜀(4+2\cdot\varepsilon)( 4 + 2 ⋅ italic_ε )-approximation sliding window algorithm for Interval Selection on arbitrary-length intervals that uses space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), where OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T denotes an optimal solution in the current sliding window.

Proof.

Recall that the output of Algorithm 3 is the output of the oldest run of an instance of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P, and let us denote this run by 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. First, if the start position of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT coincides with the oldest interval of the current sliding window then 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT was run on the entire region and we immediately obtain an approximation factor of 2222 (Property C2 of the 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P algorithm). Hence, suppose that this is not the case, and denote the run that has expired most recently by 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Observe that, prior to 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT expiring, the runs 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT were adjacent. Furthermore, the two runs differ by more than one interval since otherwise the starting position of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT would coincide with the oldest interval of the current sliding window. We now consider the suffix S𝑆Sitalic_S of intervals in the stream starting at the start position of run 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This suffix S𝑆Sitalic_S is partitioned into three parts S=ABC𝑆𝐴𝐵𝐶S=ABCitalic_S = italic_A italic_B italic_C, where A𝐴Aitalic_A are the intervals that arrived prior to the starting position of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C𝐶Citalic_C are the intervals that arrived from the moment onward when the runs 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT became adjacent (either after a clean-up or they may have been adjacent from the moment onward when 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT was created), and B𝐵Bitalic_B are the intervals between A𝐴Aitalic_A and C𝐶Citalic_C (if there are any).

We will prove that |𝒞𝒫1|=|𝒞𝒫(BC)||OPT(ABC)|/(4+2ε)𝒞subscript𝒫1𝒞𝒫𝐵𝐶𝑂𝑃𝑇𝐴𝐵𝐶42𝜀|\mathcal{CP}_{1}|=|\mathcal{CP}(BC)|\geq|OPT(ABC)|/(4+2\cdot\varepsilon)| caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | = | caligraphic_C caligraphic_P ( italic_B italic_C ) | ≥ | italic_O italic_P italic_T ( italic_A italic_B italic_C ) | / ( 4 + 2 ⋅ italic_ε ), which proves the result since the current sliding window is a suffix of ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C. Indeed, we have the following:

|OPT(ABC)|𝑂𝑃𝑇𝐴𝐵𝐶\displaystyle|OPT(ABC)|| italic_O italic_P italic_T ( italic_A italic_B italic_C ) | =|OPT(ABC)A|absent𝑂𝑃𝑇𝐴𝐵𝐶𝐴\displaystyle=|OPT(ABC)\cap A|= | italic_O italic_P italic_T ( italic_A italic_B italic_C ) ∩ italic_A |
+|OPT(ABC)(BC)|𝑂𝑃𝑇𝐴𝐵𝐶𝐵𝐶\displaystyle\quad+|OPT(ABC)\cap(B\cup C)|+ | italic_O italic_P italic_T ( italic_A italic_B italic_C ) ∩ ( italic_B ∪ italic_C ) |
|OPT(A)|+|OPT(BC)|absent𝑂𝑃𝑇𝐴𝑂𝑃𝑇𝐵𝐶\displaystyle\leq|OPT(A)|+|OPT(BC)|≤ | italic_O italic_P italic_T ( italic_A ) | + | italic_O italic_P italic_T ( italic_B italic_C ) |
2|𝒞𝒫(A)|+2|𝒞𝒫(BC)|absent2𝒞𝒫𝐴2𝒞𝒫𝐵𝐶\displaystyle\leq 2\cdot|\mathcal{CP}(A)|+2\cdot|\mathcal{CP}(BC)|≤ 2 ⋅ | caligraphic_C caligraphic_P ( italic_A ) | + 2 ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | Approx. factor of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P (Property  C2)
2|𝒞𝒫(AB)|+2|𝒞𝒫(BC)|absent2𝒞𝒫𝐴𝐵2𝒞𝒫𝐵𝐶\displaystyle\leq 2\cdot|\mathcal{CP}(AB)|+2\cdot|\mathcal{CP}(BC)|≤ 2 ⋅ | caligraphic_C caligraphic_P ( italic_A italic_B ) | + 2 ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | Monotonicity, Lemma 1
2(1+ε)|𝒞𝒫(B)|+2|𝒞𝒫(BC)|absent21𝜀𝒞𝒫𝐵2𝒞𝒫𝐵𝐶\displaystyle\leq 2\cdot(1+\varepsilon)\cdot|\mathcal{CP}(B)|+2\cdot|\mathcal{% CP}(BC)|≤ 2 ⋅ ( 1 + italic_ε ) ⋅ | caligraphic_C caligraphic_P ( italic_B ) | + 2 ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | Property S2
2(1+ε)|𝒞𝒫(BC)|+2|𝒞𝒫(BC)|absent21𝜀𝒞𝒫𝐵𝐶2𝒞𝒫𝐵𝐶\displaystyle\leq 2\cdot(1+\varepsilon)\cdot|\mathcal{CP}(BC)|+2\cdot|\mathcal% {CP}(BC)|≤ 2 ⋅ ( 1 + italic_ε ) ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | + 2 ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | Monotonicity, Lemma 1
(4+2ε)|𝒞𝒫(BC)|=(4+2ε)|𝒞𝒫1|.absent42𝜀𝒞𝒫𝐵𝐶42𝜀𝒞subscript𝒫1\displaystyle\leq(4+2\varepsilon)\cdot|\mathcal{CP}(BC)|=(4+2\varepsilon)\cdot% |\mathcal{CP}_{1}|\ .≤ ( 4 + 2 italic_ε ) ⋅ | caligraphic_C caligraphic_P ( italic_B italic_C ) | = ( 4 + 2 italic_ε ) ⋅ | caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | .

4.1.3 Sliding Window Algorithm

We will expand upon the smooth histogram method as described in Algorithm 4. The key idea is to exploit the structure of the regions created by the runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P in the smooth histogram algorithm. Based on these regions, we instantiate additional runs that target areas in which we expect to find many optimal intervals.

Whenever two runs 𝒞𝒫i𝒞subscript𝒫𝑖\mathcal{CP}_{i}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝒞𝒫i+1𝒞subscript𝒫𝑖1\mathcal{CP}_{i+1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT in Algorithm 3 become adjacent (either because of the clean-up operation or because a new run was created), proceed as follows:

Denote by R1,,Rsubscript𝑅1subscript𝑅R_{1},\dots,R_{\ell}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT the regions created by 𝒞𝒫i𝒞subscript𝒫𝑖\mathcal{CP}_{i}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT thus far.

  1. 1.

    For each region Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we initiate a new run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P, i.e., this run only considers subsequent arriving intervals that lie within Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  2. 2.

    For each pair of consecutive regions Ri,Ri+1subscript𝑅𝑖subscript𝑅𝑖1R_{i},R_{i+1}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, we initiate a new run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P that considers all subsequent intervals that lie within the merged region RiRi+1subscript𝑅𝑖subscript𝑅𝑖1R_{i}R_{i+1}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT (the region consisting of the left boundary of Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the right boundary of Ri+1subscript𝑅𝑖1R_{i+1}italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT).

  3. 3.

    Clean-up: The additional runs established in Steps 1 and 2 are associated with the run 𝒞𝒫i+1𝒞subscript𝒫𝑖1\mathcal{CP}_{i+1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, and whenever 𝒞𝒫i+1𝒞subscript𝒫𝑖1\mathcal{CP}_{i+1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT is deleted then all associated runs are also deleted.

  4. 4.

    Output: The output is generated from the oldest active run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P together with its associated runs as in steps 1111 and 2222 (see the proofs of Lemmas 3 and 4).

Algorithm 4 (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation Algorithm for Interval Selection on arbitrary-length intervals

We will now proceed and analyse Algorithm 4. To this end, we consider any fixed current sliding window.

First, similar to the analysis of Algorithm 3, we note that if the starting position of the oldest run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P, denoted 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, coincides with the left delimiter of the sliding window then we immediately obtain a 2222-approximation (by Property C2). Suppose thus that this is not the case. Again, we consider the run 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which is the latest run that has expired and was previously adjacent to 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We also consider the suffix of intervals S=ABC𝑆𝐴𝐵𝐶S=ABCitalic_S = italic_A italic_B italic_C, where A𝐴Aitalic_A are the intervals starting at the starting position of 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and ending before the starting position of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, C𝐶Citalic_C are the intervals that occurred after 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT became adjacent, and B𝐵Bitalic_B are the remaining intervals. Let OPT=OPT(ABC)𝑂𝑃𝑇𝑂𝑃𝑇𝐴𝐵𝐶OPT=OPT(ABC)italic_O italic_P italic_T = italic_O italic_P italic_T ( italic_A italic_B italic_C ), let OPTA=OPTA𝑂𝑃subscript𝑇𝐴𝑂𝑃𝑇𝐴OPT_{A}=OPT\cap Aitalic_O italic_P italic_T start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = italic_O italic_P italic_T ∩ italic_A, and define OPTB𝑂𝑃subscript𝑇𝐵OPT_{B}italic_O italic_P italic_T start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT and OPTC𝑂𝑃subscript𝑇𝐶OPT_{C}italic_O italic_P italic_T start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT similarly. Since the current sliding window is a subset of ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C, we have that an optimal solution in the current sliding window is of size at most OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T.

In Algorithm 4, we run the smooth histogram algorithm, Algorithm 3, with respect to a parameter β>0𝛽0\beta>0italic_β > 0 (i.e., replace parameter ε𝜀\varepsilonitalic_ε in the listing with β𝛽\betaitalic_β). Then, as proved in Theorem 5, we always have at least a (4+2β)42𝛽(4+2\beta)( 4 + 2 italic_β )-approximation at our disposal, i.e.,

|𝒞𝒫1|=|𝒞𝒫(BC)||OPT|4+2β.𝒞subscript𝒫1𝒞𝒫𝐵𝐶𝑂𝑃𝑇42𝛽|\mathcal{CP}_{1}|=|\mathcal{CP}(BC)|\leq\frac{|OPT|}{4+2\beta}\ .| caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | = | caligraphic_C caligraphic_P ( italic_B italic_C ) | ≤ divide start_ARG | italic_O italic_P italic_T | end_ARG start_ARG 4 + 2 italic_β end_ARG .

We define ε0superscript𝜀0\varepsilon^{\prime}\geq 0italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ 0 such that

|𝒞𝒫(BC)|=|OPT|4+2βε.𝒞𝒫𝐵𝐶𝑂𝑃𝑇42𝛽superscript𝜀|\mathcal{CP}(BC)|=\frac{|OPT|}{4+2\beta-\varepsilon^{\prime}}\ .| caligraphic_C caligraphic_P ( italic_B italic_C ) | = divide start_ARG | italic_O italic_P italic_T | end_ARG start_ARG 4 + 2 italic_β - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG . (2)

In the following, we will argue that if εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is close to 00 then we can find a better solution using the runs associated with 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Let \mathcal{R}caligraphic_R be the regions created by 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at the moment when 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT became adjacent, i.e., the regions created by the run 𝒞𝒫(AB)𝒞𝒫𝐴𝐵\mathcal{CP}(AB)caligraphic_C caligraphic_P ( italic_A italic_B ). By Property C2, we have :=||=|𝒞𝒫(AB)|assign𝒞𝒫𝐴𝐵\ell:=|\mathcal{R}|=|\mathcal{CP}(AB)|roman_ℓ := | caligraphic_R | = | caligraphic_C caligraphic_P ( italic_A italic_B ) |. For each region Risubscript𝑅𝑖R_{i}\in\mathcal{R}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_R, let xi=|OPTCRi|subscript𝑥𝑖𝑂𝑃subscript𝑇𝐶subscript𝑅𝑖x_{i}=|OPT_{C}\cap R_{i}|italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | italic_O italic_P italic_T start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ∩ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, i.e., the number of optimal intervals in C𝐶Citalic_C that lie within the region Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Furthermore, we define X:=i=1xiassign𝑋superscriptsubscript𝑖1subscript𝑥𝑖X:=\sum_{i=1}^{\ell}x_{i}italic_X := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

In the next lemma, we prove that, provided εsuperscript𝜀\varepsilon^{\prime}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is small, the quantity X𝑋Xitalic_X is necessarily large, i.e., there are many optimal intervals in C𝐶Citalic_C that lie within the regions Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We will later argue that the associated runs with 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can then be used to find many of these.

Lemma 2.
X2ε1+β.𝑋2superscript𝜀1𝛽X\geq\frac{2-\varepsilon^{\prime}}{1+\beta}\cdot\ell\ .italic_X ≥ divide start_ARG 2 - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β end_ARG ⋅ roman_ℓ .
Proof.

Observe that |OPT|X+2𝑂𝑃𝑇𝑋2|OPT|\leq X+2\ell| italic_O italic_P italic_T | ≤ italic_X + 2 roman_ℓ, since at most \ellroman_ℓ intervals of OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T can intersect the region boundaries \mathcal{R}caligraphic_R, another \ellroman_ℓ intervals of OPTAOPTB𝑂𝑃subscript𝑇𝐴𝑂𝑃subscript𝑇𝐵OPT_{A}\cup OPT_{B}italic_O italic_P italic_T start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ∪ italic_O italic_P italic_T start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT can lie within the \ellroman_ℓ regions, and the remaining ones are the X𝑋Xitalic_X intervals of OPTC𝑂𝑃subscript𝑇𝐶OPT_{C}italic_O italic_P italic_T start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT.

Then, using Property S2, Lemma 1, and Inequality 2, we obtain:

=|𝒞𝒫(AB)|(1+β)|𝒞𝒫(B)|(1+β)|𝒞𝒫(BC)|=(1+β)|OPT|4+2βε(1+β)(X+2)4+2βε,𝒞𝒫𝐴𝐵1𝛽𝒞𝒫𝐵1𝛽𝒞𝒫𝐵𝐶1𝛽𝑂𝑃𝑇42𝛽superscript𝜀1𝛽𝑋242𝛽superscript𝜀\ell=|\mathcal{CP}(AB)|\leq(1+\beta)|\mathcal{CP}(B)|\leq(1+\beta)|\mathcal{CP% }(BC)|=\frac{(1+\beta)|OPT|}{4+2\beta-\varepsilon^{\prime}}\leq\frac{(1+\beta)% (X+2\ell)}{4+2\beta-\varepsilon^{\prime}}\ ,roman_ℓ = | caligraphic_C caligraphic_P ( italic_A italic_B ) | ≤ ( 1 + italic_β ) | caligraphic_C caligraphic_P ( italic_B ) | ≤ ( 1 + italic_β ) | caligraphic_C caligraphic_P ( italic_B italic_C ) | = divide start_ARG ( 1 + italic_β ) | italic_O italic_P italic_T | end_ARG start_ARG 4 + 2 italic_β - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG ( 1 + italic_β ) ( italic_X + 2 roman_ℓ ) end_ARG start_ARG 4 + 2 italic_β - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ,

which implies the result. ∎

Consider now the run 𝒞𝒫(B)𝒞𝒫𝐵\mathcal{CP}(B)caligraphic_C caligraphic_P ( italic_B ), which coincides with the run 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT until 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT became adjacent. Let B1subscript𝐵1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the intervals computed by this run that do not intersect the boundaries of \mathcal{R}caligraphic_R and let B2subscript𝐵2B_{2}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be the intervals computed by this run that intersect the boundaries of \mathcal{R}caligraphic_R. Then, since |B1|+|B2|=|𝒞𝒫(B)|subscript𝐵1subscript𝐵2𝒞𝒫𝐵|B_{1}|+|B_{2}|=|\mathcal{CP}(B)|| italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + | italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | = | caligraphic_C caligraphic_P ( italic_B ) |, we have that either |B1|13|𝒞𝒫(B)|subscript𝐵113𝒞𝒫𝐵|B_{1}|\geq\frac{1}{3}|\mathcal{CP}(B)|| italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 3 end_ARG | caligraphic_C caligraphic_P ( italic_B ) | or |B2|23|𝒞𝒫(B)|subscript𝐵223𝒞𝒫𝐵|B_{2}|\geq\frac{2}{3}|\mathcal{CP}(B)|| italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≥ divide start_ARG 2 end_ARG start_ARG 3 end_ARG | caligraphic_C caligraphic_P ( italic_B ) |. We treat both cases separately in Lemmas 3 and 4:

Lemma 3.

Suppose that |B1|13|𝒞𝒫(B)|subscript𝐵113𝒞𝒫𝐵|B_{1}|\geq\frac{1}{3}|\mathcal{CP}(B)|| italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 3 end_ARG | caligraphic_C caligraphic_P ( italic_B ) |. Then, using the associated runs of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we can output a solution of size at least 73ε6(1+β).73superscript𝜀61𝛽\frac{7-3\varepsilon^{\prime}}{6(1+\beta)}\ell\ .divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ .

Proof.

We call a region Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT good if it contains an interval from B1subscript𝐵1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We output the solution obtained from the runs of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P on Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for all i𝑖iitalic_i, and if such a run on a good region leads to no intervals (i.e. xi=0subscript𝑥𝑖0x_{i}=0italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0), then we output the interval from B1subscript𝐵1B_{1}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT instead. Recall that 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P outputs a solution of size xi+12subscript𝑥𝑖12\frac{x_{i}+1}{2}divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG if xi0subscript𝑥𝑖0x_{i}\neq 0italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 (Property C2), and we stress here that the additive +11+1+ 1 is key for our analysis. We thus obtain a solution of size at least:

S=Ri goodmax{xi+12,1}+Ri bad,xi0xi+12.𝑆subscriptsubscript𝑅𝑖 goodsubscript𝑥𝑖121subscriptsubscript𝑅𝑖 badsubscript𝑥𝑖0subscript𝑥𝑖12S=\sum_{R_{i}\mbox{ good}}\max\left\{\frac{x_{i}+1}{2},1\right\}+\sum_{R_{i}% \mbox{ bad},x_{i}\neq 0}\frac{x_{i}+1}{2}\ .italic_S = ∑ start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT good end_POSTSUBSCRIPT roman_max { divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG , 1 } + ∑ start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bad , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG .

Recall that i=1xi=X2ε1+βsubscriptsuperscript𝑖1subscript𝑥𝑖𝑋2superscript𝜀1𝛽\sum^{\ell}_{i=1}x_{i}=X\geq\frac{2-\varepsilon^{\prime}}{1+\beta}\cdot\ell∑ start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_X ≥ divide start_ARG 2 - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_β end_ARG ⋅ roman_ℓ. Hence,

S𝑆\displaystyle Sitalic_S Ri goodxi+12+Ri bad,xi0xi+12absentsubscriptsubscript𝑅𝑖 goodsubscript𝑥𝑖12subscriptsubscript𝑅𝑖 badsubscript𝑥𝑖0subscript𝑥𝑖12\displaystyle\geq\sum_{R_{i}\mbox{ good}}\frac{x_{i}+1}{2}+\sum_{R_{i}\mbox{ % bad},x_{i}\neq 0}\frac{x_{i}+1}{2}≥ ∑ start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT good end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG + ∑ start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bad , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG
X+|B1|2absent𝑋subscript𝐵12\displaystyle\geq\frac{X+|B_{1}|}{2}≥ divide start_ARG italic_X + | italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_ARG start_ARG 2 end_ARG |B1| is the number of good regionssubscript𝐵1 is the number of good regions\displaystyle|B_{1}|\text{ is the number of good regions}| italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | is the number of good regions
2ε2(1+β)+16|𝒞𝒫(B)|absent2superscript𝜀21𝛽16𝒞𝒫𝐵\displaystyle\geq\frac{2-\varepsilon^{\prime}}{2(1+\beta)}\ell+\frac{1}{6}|% \mathcal{CP}(B)|≥ divide start_ARG 2 - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( 1 + italic_β ) end_ARG roman_ℓ + divide start_ARG 1 end_ARG start_ARG 6 end_ARG | caligraphic_C caligraphic_P ( italic_B ) | By Lemma 2
2ε2(1+β)+6(1+β)absent2superscript𝜀21𝛽61𝛽\displaystyle\geq\frac{2-\varepsilon^{\prime}}{2(1+\beta)}\ell+\frac{\ell}{6(1% +\beta)}≥ divide start_ARG 2 - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( 1 + italic_β ) end_ARG roman_ℓ + divide start_ARG roman_ℓ end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG |𝒞𝒫(B)||𝒞𝒫(AB)|1+β=1+β by Prop. S2𝒞𝒫𝐵𝒞𝒫𝐴𝐵1𝛽1𝛽 by Prop. S2\displaystyle|\mathcal{CP}(B)|\geq\frac{|\mathcal{CP}(AB)|}{1+\beta}=\frac{% \ell}{1+\beta}\text{ by Prop. {S2}}| caligraphic_C caligraphic_P ( italic_B ) | ≥ divide start_ARG | caligraphic_C caligraphic_P ( italic_A italic_B ) | end_ARG start_ARG 1 + italic_β end_ARG = divide start_ARG roman_ℓ end_ARG start_ARG 1 + italic_β end_ARG by Prop. bold_S2
73ε6(1+β).absent73superscript𝜀61𝛽\displaystyle\geq\frac{7-3\varepsilon^{\prime}}{6(1+\beta)}\ell\ .≥ divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ .

Lemma 4.

Suppose that |B2|23|𝒞𝒫(B)|subscript𝐵223𝒞𝒫𝐵|B_{2}|\geq\frac{2}{3}|\mathcal{CP}(B)|| italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≥ divide start_ARG 2 end_ARG start_ARG 3 end_ARG | caligraphic_C caligraphic_P ( italic_B ) |. Then, using the associated runs of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we can output a solution of size at least 73ε6(1+β).73superscript𝜀61𝛽\frac{7-3\varepsilon^{\prime}}{6(1+\beta)}\ell\ .divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ .

Proof.

Let B21superscriptsubscript𝐵21absentB_{2}^{1}\subseteqitalic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⊆ be the intervals of B2subscript𝐵2B_{2}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that lie on the boundary of two regions RiRi+1subscript𝑅𝑖subscript𝑅𝑖1R_{i}R_{i+1}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT where i𝑖iitalic_i is even, and let B22=B2B21superscriptsubscript𝐵22subscript𝐵2superscriptsubscript𝐵21B_{2}^{2}=B_{2}\setminus B_{2}^{1}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∖ italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT. Then, either |B21|12|B2|superscriptsubscript𝐵2112subscript𝐵2|B_{2}^{1}|\geq\frac{1}{2}|B_{2}|| italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG | italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | or |B22|12|B2|superscriptsubscript𝐵2212subscript𝐵2|B_{2}^{2}|\geq\frac{1}{2}|B_{2}|| italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG | italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT |.

Suppose that |B21|12|B2|superscriptsubscript𝐵2112subscript𝐵2|B_{2}^{1}|\geq\frac{1}{2}|B_{2}|| italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG | italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT |. We only analyse this case since the other case is similar.

We call an even index i𝑖iitalic_i good if there is an interval in B21superscriptsubscript𝐵21B_{2}^{1}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT that lies on RiRi+1subscript𝑅𝑖subscript𝑅𝑖1R_{i}R_{i+1}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT. We consider the runs of 𝒜𝒜\mathcal{A}caligraphic_A on pairs of regions RiRi+1subscript𝑅𝑖subscript𝑅𝑖1R_{i}R_{i+1}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT where i𝑖iitalic_i is even. Then, we find a solution of size:

S𝑆\displaystyle Sitalic_S =2k goodmax{x2k+x2k+1+12,1}+2k badx2k+x2k+10x2k+x2k+1+12absentsubscript2𝑘 goodsubscript𝑥2𝑘subscript𝑥2𝑘1121subscript2𝑘 badsubscript𝑥2𝑘subscript𝑥2𝑘10subscript𝑥2𝑘subscript𝑥2𝑘112\displaystyle=\sum_{2k\mbox{ good}}\max\left\{\frac{x_{2k}+x_{2k+1}+1}{2},1% \right\}+\sum_{\begin{subarray}{c}2k\mbox{ bad}\\ x_{2k}+x_{2k+1}\neq 0\end{subarray}}\frac{x_{2k}+x_{2k+1}+1}{2}= ∑ start_POSTSUBSCRIPT 2 italic_k good end_POSTSUBSCRIPT roman_max { divide start_ARG italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG , 1 } + ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL 2 italic_k bad end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT ≠ 0 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG
2k goodx2k+x2k+1+12+2k badx2k+x2k+10x2k+x2k+1+12.absentsubscript2𝑘 goodsubscript𝑥2𝑘subscript𝑥2𝑘112subscript2𝑘 badsubscript𝑥2𝑘subscript𝑥2𝑘10subscript𝑥2𝑘subscript𝑥2𝑘112\displaystyle\geq\sum_{2k\mbox{ good}}\frac{x_{2k}+x_{2k+1}+1}{2}+\sum_{\begin% {subarray}{c}2k\mbox{ bad}\\ x_{2k}+x_{2k+1}\neq 0\end{subarray}}\frac{x_{2k}+x_{2k+1}+1}{2}\ .≥ ∑ start_POSTSUBSCRIPT 2 italic_k good end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG + ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL 2 italic_k bad end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT ≠ 0 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 italic_k + 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG .

Using the identity X=i=1xi𝑋superscriptsubscript𝑖1subscript𝑥𝑖X=\sum_{i=1}^{\ell}x_{i}italic_X = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and that |B21|subscriptsuperscript𝐵12|B^{1}_{2}|| italic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | is the number of good regions and proceeding as in the proof of Lemma 3, we obtain:

S𝑆\displaystyle Sitalic_S X+|B21|22ε2(1+β)+16|𝒞𝒫(B)|73ε6(1+β).absent𝑋subscriptsuperscript𝐵1222superscript𝜀21𝛽16𝒞𝒫𝐵73superscript𝜀61𝛽\displaystyle\geq\frac{X+|B^{1}_{2}|}{2}\geq\frac{2-\varepsilon^{\prime}}{2(1+% \beta)}\ell+\frac{1}{6}|\mathcal{CP}(B)|\geq\dots\geq\frac{7-3\varepsilon^{% \prime}}{6(1+\beta)}\ell\ .≥ divide start_ARG italic_X + | italic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG start_ARG 2 end_ARG ≥ divide start_ARG 2 - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( 1 + italic_β ) end_ARG roman_ℓ + divide start_ARG 1 end_ARG start_ARG 6 end_ARG | caligraphic_C caligraphic_P ( italic_B ) | ≥ ⋯ ≥ divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ .

Theorem 6.

For any constant δ>0𝛿0\delta>0italic_δ > 0, Algorithm 4 is a (11/3+δ)113𝛿(11/3+\delta)( 11 / 3 + italic_δ )-approximation sliding window algorithm for Interval Selection on arbitrary-length intervals that uses space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ).

Proof.

The naive smooth histogram method gives us a solution of size

|𝒞𝒫(BC)||𝒞𝒫(B)||𝒞𝒫(AB)|1+β1+β,𝒞𝒫𝐵𝐶𝒞𝒫𝐵𝒞𝒫𝐴𝐵1𝛽1𝛽|\mathcal{CP}(BC)|\geq|\mathcal{CP}(B)|\geq\frac{|\mathcal{CP}(AB)|}{1+\beta}% \geq\frac{\ell}{1+\beta}\ ,| caligraphic_C caligraphic_P ( italic_B italic_C ) | ≥ | caligraphic_C caligraphic_P ( italic_B ) | ≥ divide start_ARG | caligraphic_C caligraphic_P ( italic_A italic_B ) | end_ARG start_ARG 1 + italic_β end_ARG ≥ divide start_ARG roman_ℓ end_ARG start_ARG 1 + italic_β end_ARG ,

where we used the monotonicity of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P (Lemma 1) and Property S2. Using the associated runs, by Lemmas 3 and 4, we get a solution of size at least

73ε6(1+β).73superscript𝜀61𝛽\frac{7-3\varepsilon^{\prime}}{6(1+\beta)}\ell\ .divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ .

Since we can output the larger of the two solutions, in the worst case both solutions have the same value, i.e., when:

1+β=73ε6(1+β),1𝛽73superscript𝜀61𝛽\frac{\ell}{1+\beta}=\frac{7-3\varepsilon^{\prime}}{6(1+\beta)}\ell\ ,divide start_ARG roman_ℓ end_ARG start_ARG 1 + italic_β end_ARG = divide start_ARG 7 - 3 italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG 6 ( 1 + italic_β ) end_ARG roman_ℓ ,

which implies ε=13superscript𝜀13\varepsilon^{\prime}=\frac{1}{3}italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG. The approximation factor thus is for any δ>0𝛿0\delta>0italic_δ > 0:

4+2βε+δ=11/3+2β+δ.42𝛽superscript𝜀𝛿1132𝛽𝛿4+2\cdot\beta-\varepsilon^{\prime}+\delta=11/3+2\cdot\beta+\delta\ .4 + 2 ⋅ italic_β - italic_ε start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_δ = 11 / 3 + 2 ⋅ italic_β + italic_δ .

Choosing β=12δ𝛽12𝛿\beta=\frac{1}{2}\deltaitalic_β = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ, and rescaling δ𝛿\deltaitalic_δ to 12δ12𝛿\frac{1}{2}\deltadivide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ gives the result.

As a consequence of Property S1, as previously established, the smooth histogram algorithm uses O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ) space. It remains to argue that the runs created in Steps 1 and 2 of Algorithm 4 only increase the space requirements by a constant times |OPT|𝑂𝑃𝑇|OPT|| italic_O italic_P italic_T |.

Indeed, for a fixed instance 𝒞𝒫i𝒞subscript𝒫𝑖\mathcal{CP}_{i}caligraphic_C caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, all the runs created by Step 1 are pairwise disjoint (they do not store common intervals) so their cumulative space is O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ) as we assumed the memory required to store an interval is O(1)𝑂1O(1)italic_O ( 1 ). Similarly, for the runs created by Step 2, an interval appears in at most two such runs. So, the cumulative space is again O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ). Therefore, the total number of intervals stored in the associated runs is at most O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ), completing the proof. ∎

The proof of the approximation factor of the algorithm is shown to be tight in Appendix A, meaning that the algorithm does not beat the approximation factor of 113113\frac{11}{3}divide start_ARG 11 end_ARG start_ARG 3 end_ARG.

4.2 Space Lower Bound

We now give our space lower bound for sliding window algorithms for Interval Selection on arbitrary-length intervals. Our result is established by a reduction to the three-party communication problem Chain3subscriptChain3\textsf{Chain}_{3}Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

Theorem 7.

Let ε>0𝜀0\varepsilon>0italic_ε > 0 be any small constant. Then, any algorithm in the sliding window model that computes a (2.5ε)2.5𝜀(2.5-\varepsilon)( 2.5 - italic_ε )-approximate solution to Interval Selection on arbitrary-length intervals requires a memory of size Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ).

Proof.

Let 𝒜𝒜\mathcal{A}caligraphic_A be a sliding window algorithm with approximation factor 2.5ε2.5𝜀2.5-\varepsilon2.5 - italic_ε, for some ε>0𝜀0\varepsilon>0italic_ε > 0, as in the statement of the theorem, and let n=L23𝑛𝐿23n=\frac{L-2}{3}italic_n = divide start_ARG italic_L - 2 end_ARG start_ARG 3 end_ARG, where L𝐿Litalic_L is the window length. We will argue how Chain3(n)subscriptChain3𝑛\textsf{Chain}_{3}(n)Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_n ) can be solved with the help of 𝒜𝒜\mathcal{A}caligraphic_A.

To this end, denote the three parties in the communication problem Chain3(n)subscriptChain3𝑛\textsf{Chain}_{3}(n)Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_n ) by Alice, Bob, and Charlie. Let X1{0,1}nsubscript𝑋1superscript01𝑛X_{1}\in\{0,1\}^{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be Alice’s input, let X2{0,1}nsubscript𝑋2superscript01𝑛X_{2}\in\{0,1\}^{n}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and J1[n]subscript𝐽1delimited-[]𝑛J_{1}\in[n]italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ italic_n ] be Bob’s input, and let J2[n]subscript𝐽2delimited-[]𝑛J_{2}\in[n]italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ italic_n ] be Charlie’s input. The players proceed as follows:

  • Alice: For every i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ], Alice feeds the following intervals into 𝒜𝒜\mathcal{A}caligraphic_A:

    I1(i)={[i3n,1+i3n],if X1[i]=1,[10i,10+i],if X1[i]=0.subscript𝐼1𝑖cases𝑖3𝑛1𝑖3𝑛if subscript𝑋1delimited-[]𝑖110𝑖10𝑖if subscript𝑋1delimited-[]𝑖0\displaystyle I_{1}(i)=\begin{cases}\left[\frac{i}{3n},1+\frac{i}{3n}\right],&% \text{if }X_{1}[i]=1\ ,\\ [-10-i,10+i],&\text{if }X_{1}[i]=0\ .\end{cases}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) = { start_ROW start_CELL [ divide start_ARG italic_i end_ARG start_ARG 3 italic_n end_ARG , 1 + divide start_ARG italic_i end_ARG start_ARG 3 italic_n end_ARG ] , end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_i ] = 1 , end_CELL end_ROW start_ROW start_CELL [ - 10 - italic_i , 10 + italic_i ] , end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_i ] = 0 . end_CELL end_ROW
    I2(i)={[2i3n,3i3n],if X2[i]=1,[11i,11+i],if X2[i]=0.subscript𝐼2𝑖cases2𝑖3𝑛3𝑖3𝑛if subscript𝑋2delimited-[]𝑖111𝑖11𝑖if subscript𝑋2delimited-[]𝑖0\displaystyle I_{2}(i)=\begin{cases}\left[2-\frac{i}{3n},3-\frac{i}{3n}\right]% ,&\text{if }X_{2}[i]=1\ ,\\ [-11-i,11+i],&\text{if }X_{2}[i]=0\ .\end{cases}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_i ) = { start_ROW start_CELL [ 2 - divide start_ARG italic_i end_ARG start_ARG 3 italic_n end_ARG , 3 - divide start_ARG italic_i end_ARG start_ARG 3 italic_n end_ARG ] , end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_i ] = 1 , end_CELL end_ROW start_ROW start_CELL [ - 11 - italic_i , 11 + italic_i ] , end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_i ] = 0 . end_CELL end_ROW

    The order given is I1(1),I2(1),I1(2),I2(2),,I1(n),I2(n)subscript𝐼11subscript𝐼21subscript𝐼12subscript𝐼22subscript𝐼1𝑛subscript𝐼2𝑛I_{1}(1),I_{2}(1),I_{1}(2),I_{2}(2),\dots,I_{1}(n),I_{2}(n)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 ) , italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 2 ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 2 ) , … , italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n ). We observe that, for every i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ], when X1[i]=1subscript𝑋1delimited-[]𝑖1X_{1}[i]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_i ] = 1, the intervals I1(i)subscript𝐼1𝑖I_{1}(i)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) and I2(i)subscript𝐼2𝑖I_{2}(i)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_i ) are disjoint. Alice sends the memory state of 𝒜𝒜\mathcal{A}caligraphic_A to Bob.

  • Bob: For every i[n+2(J11)]𝑖delimited-[]𝑛2subscript𝐽11i\in[n+2(J_{1}-1)]italic_i ∈ [ italic_n + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) ], Bob feeds the following interval into 𝒜𝒜\mathcal{A}caligraphic_A:

    I3(i)={[1+J13n+16n+i16n2,2J13n16n+i16n2],if in and X[i]=1,[10i,11+i],otherwise.subscript𝐼3𝑖cases1subscript𝐽13𝑛16𝑛𝑖16superscript𝑛22subscript𝐽13𝑛16𝑛𝑖16superscript𝑛2if 𝑖𝑛 and 𝑋delimited-[]𝑖110𝑖11𝑖otherwise\displaystyle I_{3}(i)=\begin{cases}\left[1+\frac{J_{1}}{3n}+\frac{1}{6n}+% \frac{i-1}{6n^{2}},2-\frac{J_{1}}{3n}-\frac{1}{6n}+\frac{i-1}{6n^{2}}\right],&% \text{if }i\leq n\text{ and }X[i]=1\ ,\\ [-10-i,11+i],&\text{otherwise}\ .\end{cases}italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) = { start_ROW start_CELL [ 1 + divide start_ARG italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG 6 italic_n end_ARG + divide start_ARG italic_i - 1 end_ARG start_ARG 6 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 2 - divide start_ARG italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 3 italic_n end_ARG - divide start_ARG 1 end_ARG start_ARG 6 italic_n end_ARG + divide start_ARG italic_i - 1 end_ARG start_ARG 6 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] , end_CELL start_CELL if italic_i ≤ italic_n and italic_X [ italic_i ] = 1 , end_CELL end_ROW start_ROW start_CELL [ - 10 - italic_i , 11 + italic_i ] , end_CELL start_CELL otherwise . end_CELL end_ROW

    Let k[n]𝑘delimited-[]𝑛k\in[n]italic_k ∈ [ italic_n ]. Notice that, for every j[n+2(J11)]𝑗delimited-[]𝑛2subscript𝐽11j\in[n+2(J_{1}-1)]italic_j ∈ [ italic_n + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) ], when X1[k]=X2[j]=1subscript𝑋1delimited-[]𝑘subscript𝑋2delimited-[]𝑗1X_{1}[k]=X_{2}[j]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_k ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_j ] = 1, we have that I3(j)subscript𝐼3𝑗I_{3}(j)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_j ) is disjoint with both I1(k)subscript𝐼1𝑘I_{1}(k)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_k ) and I2(k)subscript𝐼2𝑘I_{2}(k)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_k ) if and only if jJ1𝑗subscript𝐽1j\leq J_{1}italic_j ≤ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Otherwise, I3(j)subscript𝐼3𝑗I_{3}(j)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_j ) intersects with both I1(k)subscript𝐼1𝑘I_{1}(k)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_k ) and I2(k)subscript𝐼2𝑘I_{2}(k)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_k ).

    Bob sends the memory state of 𝒜𝒜\mathcal{A}caligraphic_A and J2subscript𝐽2J_{2}italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to Charlie.

  • Charlie: We denote the interval boundaries of I3(i)subscript𝐼3𝑖I_{3}(i)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) by aI3(i)subscript𝑎subscript𝐼3𝑖a_{I_{3}(i)}italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT and bI3(i)subscript𝑏subscript𝐼3𝑖b_{I_{3}(i)}italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT, i.e., I3(i)=[aI3(i),bI3(i)]subscript𝐼3𝑖subscript𝑎subscript𝐼3𝑖subscript𝑏subscript𝐼3𝑖I_{3}(i)=[a_{I_{3}(i)},b_{I_{3}(i)}]italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) = [ italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ]. Charlie feeds the following two intervals into 𝒜𝒜\mathcal{A}caligraphic_A:

    IJ2subscript𝐼subscript𝐽2\displaystyle I_{J_{2}}italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =[2aI3(J21)+aI3(J2)3,aI3(J21)+2aI3(J2)3], andabsent2subscript𝑎subscript𝐼3subscript𝐽21subscript𝑎subscript𝐼3subscript𝐽23subscript𝑎subscript𝐼3subscript𝐽212subscript𝑎subscript𝐼3subscript𝐽23 and\displaystyle=\left[\frac{2a_{I_{3}(J_{2}-1)}+a_{I_{3}(J_{2})}}{3},\frac{a_{I_% {3}(J_{2}-1)}+2a_{I_{3}(J_{2})}}{3}\right]\ ,\mbox{ and}= [ divide start_ARG 2 italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG , divide start_ARG italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_POSTSUBSCRIPT + 2 italic_a start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG ] , and
    IJ2subscriptsuperscript𝐼subscript𝐽2\displaystyle I^{\prime}_{J_{2}}italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =[2bI3(J21)+bI3(J2)3,bI3(J21)+2bI3(J2)3].absent2subscript𝑏subscript𝐼3subscript𝐽21subscript𝑏subscript𝐼3subscript𝐽23subscript𝑏subscript𝐼3subscript𝐽212subscript𝑏subscript𝐼3subscript𝐽23\displaystyle=\left[\frac{2b_{I_{3}(J_{2}-1)}+b_{I_{3}(J_{2})}}{3},\frac{b_{I_% {3}(J_{2}-1)}+2b_{I_{3}(J_{2})}}{3}\right]\ .= [ divide start_ARG 2 italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG , divide start_ARG italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) end_POSTSUBSCRIPT + 2 italic_b start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG ] .

    Notice that IJ2subscript𝐼subscript𝐽2I_{J_{2}}italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT intersects all intervals of I3(i)subscript𝐼3𝑖I_{3}(i)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ), for all i<J2𝑖subscript𝐽2i<J_{2}italic_i < italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, while IJ2subscriptsuperscript𝐼subscript𝐽2I^{\prime}_{J_{2}}italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT intersects all intervals of I3(i)subscript𝐼3𝑖I_{3}(i)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ), for all i>J2𝑖subscript𝐽2i>J_{2}italic_i > italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

    Using 𝒜𝒜\mathcal{A}caligraphic_A, Charlie computes the largest independent set OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T of

    ={I1(k)|J1kn}{I2(k)|J1kn}{I3(k)|1kn+2(J11)}{IJ2,IJ2},conditional-setsubscript𝐼1𝑘subscript𝐽1𝑘𝑛conditional-setsubscript𝐼2𝑘subscript𝐽1𝑘𝑛conditional-setsubscript𝐼3𝑘1𝑘𝑛2subscript𝐽11subscript𝐼subscript𝐽2subscriptsuperscript𝐼subscript𝐽2\mathcal{I}=\{I_{1}(k)|J_{1}\leq k\leq n\}\cup\{I_{2}(k)|J_{1}\leq k\leq n\}% \cup\{I_{3}(k)|1\leq k\leq n+2(J_{1}-1)\}\cup\{I_{J_{2}},I^{\prime}_{J_{2}}\}\ ,caligraphic_I = { italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_k ) | italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_k ≤ italic_n } ∪ { italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_k ) | italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_k ≤ italic_n } ∪ { italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_k ) | 1 ≤ italic_k ≤ italic_n + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) } ∪ { italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } ,

    which is possible since 𝒜𝒜\mathcal{A}caligraphic_A is a sliding window algorithm and thus able to solve the situation when the intervals 1k<J1(I1(k)I2(k))subscript1𝑘subscript𝐽1subscript𝐼1𝑘subscript𝐼2𝑘\cup_{1\leq k<J_{1}}\left(I_{1}(k)\cup I_{2}(k)\right)∪ start_POSTSUBSCRIPT 1 ≤ italic_k < italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_k ) ∪ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_k ) ) have expired.

Figure 5 provides an illustration of the proof of Theorem 7.

I1(1)subscript𝐼11I_{1}(1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 )I1(J11)subscript𝐼1subscript𝐽11I_{1}(J_{1}-1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 )I1(J1)subscript𝐼1subscript𝐽1I_{1}(J_{1})italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )I1(J1+1)subscript𝐼1subscript𝐽11I_{1}(J_{1}+1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 )I1(n)subscript𝐼1𝑛I_{1}(n)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n )I2(n)subscript𝐼2𝑛I_{2}(n)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n )I1(J1+1)subscript𝐼1subscript𝐽11I_{1}(J_{1}+1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 )I2(J1)subscript𝐼2subscript𝐽1I_{2}(J_{1})italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )I2(J11)subscript𝐼2subscript𝐽11I_{2}(J_{1}-1)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 )I2(1)subscript𝐼21I_{2}(1)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 )I3(1)subscript𝐼31I_{3}(1)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( 1 )I3(J21)subscript𝐼3subscript𝐽21I_{3}(J_{2}-1)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 )I3(J2)subscript𝐼3subscript𝐽2I_{3}(J_{2})italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )IJ2subscript𝐼subscript𝐽2I_{J_{2}}italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTIJ2subscriptsuperscript𝐼subscript𝐽2I^{\prime}_{J_{2}}italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTI3(J2+1)subscript𝐼3subscript𝐽21I_{3}(J_{2}+1)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 )I3(n)subscript𝐼3𝑛I_{3}(n)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_n )
Figure 5: This figure illustrates the intervals created by Alice, Bob and Charlie in the proof of Theorem 7 for an instance of Chain3(n)subscriptChain3𝑛\textsf{Chain}_{3}(n)Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_n ) where n=L23𝑛𝐿23n=\frac{L-2}{3}italic_n = divide start_ARG italic_L - 2 end_ARG start_ARG 3 end_ARG with X1[J1]=X2[J2]=1subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽21X_{1}[J_{1}]=X_{2}[J_{2}]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 1. The red intervals in the figure (I1(1),I1(J11)subscript𝐼11subscript𝐼1subscript𝐽11I_{1}(1),I_{1}(J_{1}-1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 ) , italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ),I2(1)subscript𝐼21I_{2}(1)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 ),I2(J11)subscript𝐼2subscript𝐽11I_{2}(J_{1}-1)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 )) correspond to expired intervals. The optimal solution is {I1(J1),I2(J1),IJ2,IJ2,I3(J2)}subscript𝐼1subscript𝐽1subscript𝐼2subscript𝐽1subscript𝐼subscript𝐽2subscriptsuperscript𝐼subscript𝐽2subscript𝐼3subscript𝐽2\{I_{1}(J_{1}),I_{2}(J_{1}),I_{J_{2}},I^{\prime}_{J_{2}},I_{3}(J_{2})\}{ italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) } of size 5 . Otherwise, if X1[J1]=X2[J2]=0subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽20X_{1}[J_{1}]=X_{2}[J_{2}]=0italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 0, then the optimal solution would have been of size 2. All intervals I3(i)subscript𝐼3𝑖I_{3}(i)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) are disjoint from I1(J1)subscript𝐼1subscript𝐽1I_{1}(J_{1})italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and I2(J2)subscript𝐼2subscript𝐽2I_{2}(J_{2})italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). However, they intersect with I1(J1+1)subscript𝐼1subscript𝐽11I_{1}(J_{1}+1)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 ) and I2(J2+1)subscript𝐼2subscript𝐽21I_{2}(J_{2}+1)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 ) as emphasized by the vertical dashed lines. Intervals I3(i)subscript𝐼3𝑖I_{3}(i)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_i ) for n+2(J11)i>n𝑛2subscript𝐽11𝑖𝑛n+2(J_{1}-1)\geq i>nitalic_n + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) ≥ italic_i > italic_n have been omitted as they do not impact the optimal solution and their only role is to advance the sliding window.

.

The total number of intervals added by the three players is 3n+2+2(J11)=L+2(J11)3𝑛22subscript𝐽11𝐿2subscript𝐽113n+2+2(J_{1}-1)=L+2(J_{1}-1)3 italic_n + 2 + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) = italic_L + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ). So, after Charlie’s execution 𝒜𝒜\mathcal{A}caligraphic_A, the incumbent region indeed consists of \mathcal{I}caligraphic_I.

We will argue now that if X1[J1]=X2[J2]=1subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽21X_{1}[J_{1}]=X_{2}[J_{2}]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 1 then the optimal solution size is 5555, while if X1[J1]=X2[J2]=0subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽20X_{1}[J_{1}]=X_{2}[J_{2}]=0italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 0 then the optimal solution size is 2222.

Suppose thus that X1[J1]=X2[J2]=1subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽21X_{1}[J_{1}]=X_{2}[J_{2}]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 1. Then it is not hard to see that the unique optimal solution is {I1(J1),I2(J1),I3(J2),IJ2,IJ2}subscript𝐼1subscript𝐽1subscript𝐼2subscript𝐽1subscript𝐼3subscript𝐽2subscript𝐼subscript𝐽2subscriptsuperscript𝐼subscript𝐽2\{I_{1}(J_{1}),I_{2}(J_{1}),I_{3}(J_{2}),I_{J_{2}},I^{\prime}_{J_{2}}\}{ italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } of size 5555.

Next, suppose that X1[J1]=X2[J2]=0subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽20X_{1}[J_{1}]=X_{2}[J_{2}]=0italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 0. Notice first that, in this case, I1(J1),I2(J1),I3(J2)subscript𝐼1subscript𝐽1subscript𝐼2subscript𝐽1subscript𝐼3subscript𝐽2I_{1}(J_{1}),I_{2}(J_{1}),I_{3}(J_{2})italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) intersect with every other interval in the input, so they can only belong to independent sets of size at most 1111.

Also, we have that any interval I1(i)subscript𝐼1𝑖I_{1}(i)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) with i>J1𝑖subscript𝐽1i>J_{1}italic_i > italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT would block all the intervals I3(j)subscript𝐼3𝑗I_{3}(j)italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_j ) for 1jn+2(J11)1𝑗𝑛2subscript𝐽111\leq j\leq n+2(J_{1}-1)1 ≤ italic_j ≤ italic_n + 2 ( italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) and IJ2subscript𝐼subscript𝐽2I_{J_{2}}italic_I start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. So, an interval from I1(i)subscript𝐼1𝑖I_{1}(i)italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) with i>J1𝑖subscript𝐽1i>J_{1}italic_i > italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be included in an optimal set of size at most 2222 (either {I1(i),IJ2}subscript𝐼1𝑖subscriptsuperscript𝐼subscript𝐽2\{I_{1}(i),I^{\prime}_{J_{2}}\}{ italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) , italic_I start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } or {I1(i),I2(j)}subscript𝐼1𝑖subscript𝐼2𝑗\{I_{1}(i),I_{2}(j)\}{ italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_i ) , italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_j ) } for some j>J1𝑗subscript𝐽1j>J_{1}italic_j > italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). Similarly, I2(i)subscript𝐼2𝑖I_{2}(i)italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_i ) with i>J2𝑖subscript𝐽2i>J_{2}italic_i > italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be included in an optimal set of size at most 2222. Furthermore, we can construct from Bob and Charlie’s input a solution of size at most 2222 (similar to the 32ε32𝜀\frac{3}{2}-\varepsilondivide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_ε lower bound construction of [10]). The size of an optimal solution is thus in this case 2222.

Recall that 𝒜𝒜\mathcal{A}caligraphic_A has an approximation factor of 2.5ε2.5𝜀2.5-\varepsilon2.5 - italic_ε. Hence, if X1[J1]=X2[J2]=1subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽21X_{1}[J_{1}]=X_{2}[J_{2}]=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 1 then 𝒜𝒜\mathcal{A}caligraphic_A reports a solution of size at least 3333, thereby distinguishing it from the case when X1[J1]=X2[J2]=0subscript𝑋1delimited-[]subscript𝐽1subscript𝑋2delimited-[]subscript𝐽20X_{1}[J_{1}]=X_{2}[J_{2}]=0italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 0, which yields an optimal size of 2222.

Since, by Theorem 2, Chain3(n)subscriptChain3𝑛\textsf{Chain}_{3}(n)Chain start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_n ) requires a message of size Ω(n)Ω𝑛\Omega(n)roman_Ω ( italic_n ), and since the protocol solely consists of forwarding the memory state of 𝒜𝒜\mathcal{A}caligraphic_A, we conclude that 𝒜𝒜\mathcal{A}caligraphic_A requires a memory of size Ω(n)=Ω(L)Ω𝑛Ω𝐿\Omega(n)=\Omega(L)roman_Ω ( italic_n ) = roman_Ω ( italic_L ), which completes the proof. ∎

5 Conclusion

In this paper, we initiated the study of the Interval Selection problem in the sliding window model of computation. We gave algorithms and lower bounds for both unit-length and arbitrary-length intervals. In the unit-length case, we gave a 2222-approximation algorithm that uses space O(|OPT|)𝑂𝑂𝑃𝑇O(|OPT|)italic_O ( | italic_O italic_P italic_T | ), and we showed that this is best possible in that any (2ε)2𝜀(2-\varepsilon)( 2 - italic_ε )-approximation algorithm requires space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ). In the arbitrary-length case, we gave a (113+ε)113𝜀(\frac{11}{3}+\varepsilon)( divide start_ARG 11 end_ARG start_ARG 3 end_ARG + italic_ε )-approximation algorithm that uses space O~(|OPT|)~O𝑂𝑃𝑇\tilde{\mathrm{O}}(|OPT|)over~ start_ARG roman_O end_ARG ( | italic_O italic_P italic_T | ), and we showed that any (52ε)52𝜀(\frac{5}{2}-\varepsilon)( divide start_ARG 5 end_ARG start_ARG 2 end_ARG - italic_ε )-approximation algorithm requires space Ω(L)Ω𝐿\Omega(L)roman_Ω ( italic_L ). Contrasted with results known from the one-pass streaming setting, our result implies that Interval Selection in both the unit-length and the arbitrary-length cases is harder to solve in the sliding window setting than in the one-pass streaming setting.

We conclude with two open questions.

First, the approximation guarantees of our algorithm for arbitrary-length intervals and our respective lower bound do not match. Can we close this gap?

Second, the sliding window model has received significantly less attention for the study of graph problems than the traditional one-pass streaming setting. While from a theoretical perspective, the sliding window model is less clean than the one-pass streaming model, as discussed in the introduction, it is, however, the more suitable model for many applications. We are particularly interested in understanding the differences between the two models. For example, which graph problems can be solved equally well in the sliding window model as in the one-pass streaming setting, and which problems are significantly harder to solve?

References

  • [1] Cezar-Mihail Alexandru, Pavel Dvorák, Christian Konrad, and Kheeran K. Naidu. Improved weighted matching in the sliding window model. In Petra Berenbrink, Patricia Bouyer, Anuj Dawar, and Mamadou Moustapha Kanté, editors, 40th International Symposium on Theoretical Aspects of Computer Science, STACS 2023, March 7-9, 2023, Hamburg, Germany, volume 254 of LIPIcs, pages 6:1–6:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.STACS.2023.6.
  • [2] Ainesh Bakshi, Nadiia Chepurko, and David P. Woodruff. Weighted maximum independent set of geometric objects in turnstile streams. In International Workshop and International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2019. URL: https://api.semanticscholar.org/CorpusID:67856291.
  • [3] Leyla Biabani, Mark de Berg, and Morteza Monemizadeh. Maximum-weight matching in sliding windows and beyond. 2021. URL: https://api.semanticscholar.org/CorpusID:245276580.
  • [4] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 283–293. IEEE Computer Society, 2007. doi:10.1109/FOCS.2007.55.
  • [5] Sergio Cabello and Pablo Pérez-Lantero. Interval selection in the streaming model. Theor. Comput. Sci., 702:77–96, 2017. doi:10.1016/j.tcs.2017.08.015.
  • [6] Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. ArXiv, abs/1807.08331, 2018. URL: https://api.semanticscholar.org/CorpusID:49907556.
  • [7] Michael S. Crouch, Andrew McGregor, and Daniel M. Stubbs. Dynamic graphs in the sliding-window model. In Hans L. Bodlaender and Giuseppe F. Italiano, editors, Algorithms - ESA 2013 - 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings, volume 8125 of Lecture Notes in Computer Science, pages 337–348. Springer, 2013. doi:10.1007/978-3-642-40450-4\_29.
  • [8] Jacques Dark, Adithya Diddapur, and Christian Konrad. Interval selection in data streams: Weighted intervals and the insertion-deletion setting. In Foundations of Software Technology and Theoretical Computer Science, 2023. URL: https://api.semanticscholar.org/CorpusID:266192962.
  • [9] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’02, page 635–644, USA, 2002. Society for Industrial and Applied Mathematics.
  • [10] Yuval Emek, Magnús M. Halldórsson, and Adi Rosén. Space-constrained interval selection. ACM Trans. Algorithms, 12(4):51:1–51:32, 2016. doi:10.1145/2886102.
  • [11] Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 1363–1374, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3357713.3384286.
  • [12] T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory Comput., 4:129–135, 2008. URL: https://api.semanticscholar.org/CorpusID:15825208.
  • [13] Robert Krauthgamer and David Reitblat. Almost-smooth histograms and sliding-window graph algorithms. Algorithmica, 84(10):2926–2953, 2022. doi:10.1007/s00453-022-00988-y.
  • [14] Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997.
  • [15] Ami Paz and Gregory Schwartzman. A (2+ϵitalic-ϵ\epsilonitalic_ϵ)-approximation for maximum weight matching in the semi-streaming model. ACM Trans. Algorithms, 15(2):18:1–18:15, 2019. doi:10.1145/3274668.
  • [16] Sai Krishna Chaitanya Nalam Venkata Subrahmanya. Vertex cover in the sliding window model. Master’s thesis, Rutgers, The State University of New Jersey, 2021.
  • [17] Janani Sundaresan. Optimal communication complexity of chained index, 2024. arXiv:2404.07026.

Appendix A Hard Instance for the Analysis of Algorithm 4

A.1 Description of the Instance

We will present a hard instance demonstrating that the analysis of Algorithm 4 is tight. We assume that the smooth histogram parameter β𝛽\betaitalic_β is set to β=0𝛽0\beta=0italic_β = 0. This is a reasonable assumption since the approximation factor of the algorithm approaches the optimal value of 113113\frac{11}{3}divide start_ARG 11 end_ARG start_ARG 3 end_ARG when β0𝛽0\beta\rightarrow 0italic_β → 0.

As before, let 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be the oldest still active instance created by the smooth histogram algorithm, and let 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the expired instance which came before 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Given 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we can divide the active stream into successive parts A,B,C𝐴𝐵𝐶A,B,Citalic_A , italic_B , italic_C. A𝐴Aitalic_A represents the intervals that arrived before the starting position of 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. C𝐶Citalic_C represents the intervals that arrived right after the runs 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT became adjacent (i.e after all the instances between 𝒞𝒫0𝒞subscript𝒫0\mathcal{CP}_{0}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝒞𝒫1𝒞subscript𝒫1\mathcal{CP}_{1}caligraphic_C caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are deleted). The intervals arriving after A𝐴Aitalic_A but before C𝐶Citalic_C are denoted as B𝐵Bitalic_B.

The smooth histogram condition then translates to 𝒞𝒫(AB)=𝒞𝒫(A)𝒞𝒫𝐴𝐵𝒞𝒫𝐴\mathcal{CP}(AB)=\mathcal{CP}(A)caligraphic_C caligraphic_P ( italic_A italic_B ) = caligraphic_C caligraphic_P ( italic_A ).

Let \ellroman_ℓ be a positive integer divisible by 3. We will first give the full stream in Algorithm 5 and then explain the purpose of each portion of the stream.

Algorithm 5 Hard instance stream S=ABC𝑆𝐴𝐵𝐶S=ABCitalic_S = italic_A italic_B italic_C for Algorithm 4

Stream A

Stream A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     Insert [x+0.1,x+1]𝑥0.1𝑥1[x+0.1,x+1][ italic_x + 0.1 , italic_x + 1 ]

Stream A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     Insert [x+0.5,x+0.54]𝑥0.5𝑥0.54[x+0.5,x+0.54][ italic_x + 0.5 , italic_x + 0.54 ]

Stream A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     Insert [x+0.95,x+1.05]𝑥0.95𝑥1.05[x+0.95,x+1.05][ italic_x + 0.95 , italic_x + 1.05 ]

 

Stream B

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     if x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2 for integer k𝑘kitalic_k then
3:         Insert [x0.1,x+0.26]𝑥0.1𝑥0.26[x-0.1,x+0.26][ italic_x - 0.1 , italic_x + 0.26 ]
4:         Insert [x+0.53,x+0.71]𝑥0.53𝑥0.71[x+0.53,x+0.71][ italic_x + 0.53 , italic_x + 0.71 ]
5:         Insert [x+0.9,x+1.1]𝑥0.9𝑥1.1[x+0.9,x+1.1][ italic_x + 0.9 , italic_x + 1.1 ]      

  Stream C

Stream C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT:

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     if x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2 for integer k𝑘kitalic_k then
3:         Insert [x+0.06,x+0.3]𝑥0.06𝑥0.3[x+0.06,x+0.3][ italic_x + 0.06 , italic_x + 0.3 ]
4:         Insert [x+0.35,x+0.75]𝑥0.35𝑥0.75[x+0.35,x+0.75][ italic_x + 0.35 , italic_x + 0.75 ]      

Stream C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT:

1:for x𝑥xitalic_x from 1111 to \ellroman_ℓ do
2:     if x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2 for integer k𝑘kitalic_k then
3:         Insert [x+0.06,x+0.15]𝑥0.06𝑥0.15[x+0.06,x+0.15][ italic_x + 0.06 , italic_x + 0.15 ]
4:         Insert [x+0.25,x+0.35]𝑥0.25𝑥0.35[x+0.25,x+0.35][ italic_x + 0.25 , italic_x + 0.35 ]
5:         Insert [x+0.55,x+0.6]𝑥0.55𝑥0.6[x+0.55,x+0.6][ italic_x + 0.55 , italic_x + 0.6 ]
6:         Insert [x+0.7,x+0.8]𝑥0.7𝑥0.8[x+0.7,x+0.8][ italic_x + 0.7 , italic_x + 0.8 ]
7:         Insert [x+0.9,x+0.94]𝑥0.9𝑥0.94[x+0.9,x+0.94][ italic_x + 0.9 , italic_x + 0.94 ]      

We call the created regions [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ) with x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2, for some integer k𝑘kitalic_k, good. Notice that the number of regions created by 𝒞𝒫(A)𝒞𝒫𝐴\mathcal{CP}(A)caligraphic_C caligraphic_P ( italic_A ) is \ellroman_ℓ while the number of good regions is exactly /33\ell/3roman_ℓ / 3 because \ellroman_ℓ is divisible by 3.

We will now discuss the instance created in Algorithm 5, which is also sketched in Figure 6.

A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTA3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTB𝐵Bitalic_BC1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTC2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 6: Illustration of a good region, where the vertical black lines depict its boundaries. Black intervals represent streams A2,A3subscript𝐴2subscript𝐴3A_{2},A_{3}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Blue intervals represent the stream B𝐵Bitalic_B. The upper green intervals belong to C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The bottom green intervals belong to C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Stream A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is omitted to keep the illustration simple – these intervals are responsible for creating the regions of 𝒞𝒫(A)𝒞𝒫𝐴\mathcal{CP}(A)caligraphic_C caligraphic_P ( italic_A )).
  • Stream A𝐴Aitalic_A

    Stream A𝐴Aitalic_A has three parts in this order: A1,A2,A3subscript𝐴1subscript𝐴2subscript𝐴3A_{1},A_{2},A_{3}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

    • Stream A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

      This stream is responsible for creating the regions [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ) for any integer x[]{1,}𝑥delimited-[]1x\in[\ell]\setminus\{1,\ell\}italic_x ∈ [ roman_ℓ ] ∖ { 1 , roman_ℓ } and regions (,2),[,)2(-\infty,2),[\ell,\infty)( - ∞ , 2 ) , [ roman_ℓ , ∞ ).

    • Stream A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

      This stream inserts intervals inside the regions [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ). Notice that the interval [x+0.5,x+0.54]A2𝑥0.5𝑥0.54subscript𝐴2[x+0.5,x+0.54]\in A_{2}[ italic_x + 0.5 , italic_x + 0.54 ] ∈ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is completely inside the region [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ) and replaces the interval [x+0.1,x+1]A1𝑥0.1𝑥1subscript𝐴1[x+0.1,x+1]\in A_{1}[ italic_x + 0.1 , italic_x + 1 ] ∈ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as both the leftmost and the rightmost intervals of the region.

    • Stream A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT

      This stream inserts intervals that intersect the boundaries of regions created by A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The stream contributes to the size of the optimal solution of the overall stream ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C.

  • Stream B𝐵Bitalic_B

    If we execute stream B𝐵Bitalic_B immediately after stream A𝐴Aitalic_A, the regions created in A𝐴Aitalic_A will remain unchanged. In B𝐵Bitalic_B, we only add items inside or intersecting the good regions. Consider therefore a good region R=[x,x+1)𝑅𝑥𝑥1R=[x,x+1)italic_R = [ italic_x , italic_x + 1 ).

    The intervals [x0.1,x+0.26],[x+0.9,x+1.1]𝑥0.1𝑥0.26𝑥0.9𝑥1.1[x-0.1,x+0.26],[x+0.9,x+1.1][ italic_x - 0.1 , italic_x + 0.26 ] , [ italic_x + 0.9 , italic_x + 1.1 ] are intervals crossing the boundary of R𝑅Ritalic_R. They completely include the boundary intervals of A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (i.e [x0.05,x+0.05]𝑥0.05𝑥0.05[x-0.05,x+0.05][ italic_x - 0.05 , italic_x + 0.05 ] from the previous region and [x+0.95,x+1.05]𝑥0.95𝑥1.05[x+0.95,x+1.05][ italic_x + 0.95 , italic_x + 1.05 ]). The interval [x+0.53,x+0.71]B𝑥0.53𝑥0.71𝐵[x+0.53,x+0.71]\in B[ italic_x + 0.53 , italic_x + 0.71 ] ∈ italic_B is an interval inside R𝑅Ritalic_R that intersects the interval [x+0.5,x+0.54]A2𝑥0.5𝑥0.54subscript𝐴2[x+0.5,x+0.54]\in A_{2}[ italic_x + 0.5 , italic_x + 0.54 ] ∈ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

  • Stream C𝐶Citalic_C

    The stream C𝐶Citalic_C is divided into C1,C2subscript𝐶1subscript𝐶2C_{1},C_{2}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and only adds intervals completely included within good regions. Consider therefore a good region R=[x,x+1)𝑅𝑥𝑥1R=[x,x+1)italic_R = [ italic_x , italic_x + 1 ).

    • Stream C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

      The purpose of the stream C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is to create new regions from the regions of the run 𝒞𝒫(AB)𝒞𝒫𝐴𝐵\mathcal{CP}(AB)caligraphic_C caligraphic_P ( italic_A italic_B ). The new regions created inside R𝑅Ritalic_R are [x,x+0.3),[x+0.3,x+0.75)𝑥𝑥0.3𝑥0.3𝑥0.75[x,x+0.3),[x+0.3,x+0.75)[ italic_x , italic_x + 0.3 ) , [ italic_x + 0.3 , italic_x + 0.75 ) and [x+0.75,x+1)𝑥0.75𝑥1[x+0.75,x+1)[ italic_x + 0.75 , italic_x + 1 ).

    • Stream C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

      The stream C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT contributes to |OPTC|𝑂𝑃𝑇𝐶|OPT\cap C|| italic_O italic_P italic_T ∩ italic_C | of our instance, where OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T is an independent set of optimal size inside the stream ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C. The interval [x+0.06,x+0.15]C2𝑥0.06𝑥0.15subscript𝐶2[x+0.06,x+0.15]\in C_{2}[ italic_x + 0.06 , italic_x + 0.15 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT intersects the interval [x0.1,x+0.26]B𝑥0.1𝑥0.26𝐵[x-0.1,x+0.26]\in B[ italic_x - 0.1 , italic_x + 0.26 ] ∈ italic_B, but it does not intersect the interval [x0.05,x+0.05]A3𝑥0.05𝑥0.05subscript𝐴3[x-0.05,x+0.05]\in A_{3}[ italic_x - 0.05 , italic_x + 0.05 ] ∈ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. The interval [x+0.9,x+0.94]C2𝑥0.9𝑥0.94subscript𝐶2[x+0.9,x+0.94]\in C_{2}[ italic_x + 0.9 , italic_x + 0.94 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT has similar properties. The intervals [x+0.25,x+0.35]C2𝑥0.25𝑥0.35subscript𝐶2[x+0.25,x+0.35]\in C_{2}[ italic_x + 0.25 , italic_x + 0.35 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and [x+0.7,x+0.8]C2𝑥0.7𝑥0.8subscript𝐶2[x+0.7,x+0.8]\in C_{2}[ italic_x + 0.7 , italic_x + 0.8 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT intersect with the boundaries of the regions of 𝒞𝒫(C1)𝒞𝒫subscript𝐶1\mathcal{CP}(C_{1})caligraphic_C caligraphic_P ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), so they will not be saved by the algorithm. Lastly, the interval [x+0.55,x+0.6]C2𝑥0.55𝑥0.6subscript𝐶2[x+0.55,x+0.6]\in C_{2}[ italic_x + 0.55 , italic_x + 0.6 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT does not intersect with [x+0.5,x+0.54]A2𝑥0.5𝑥0.54subscript𝐴2[x+0.5,x+0.54]\in A_{2}[ italic_x + 0.5 , italic_x + 0.54 ] ∈ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, but it intersects with [x+0.53,x+0.71]B𝑥0.53𝑥0.71𝐵[x+0.53,x+0.71]\in B[ italic_x + 0.53 , italic_x + 0.71 ] ∈ italic_B.

A.2 Analysis of the Instance

Here we will prove that the output is indeed a 113113\frac{11}{3}divide start_ARG 11 end_ARG start_ARG 3 end_ARG approximation of the optimal solution, therefore proving that our analysis of Algorithm 4 is best possible.

Lemma 5.

The streams A,B𝐴𝐵A,Bitalic_A , italic_B yield 𝒞𝒫(AB)=𝒞𝒫(B)=𝒞𝒫𝐴𝐵𝒞𝒫𝐵\mathcal{CP}(AB)=\mathcal{CP}(B)=\ellcaligraphic_C caligraphic_P ( italic_A italic_B ) = caligraphic_C caligraphic_P ( italic_B ) = roman_ℓ, hence the smooth histogram condition is obeyed.

Proof.

Notice that after the run of stream A𝐴Aitalic_A, we have created the regions [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ) for x[]𝑥delimited-[]x\in[\ell]italic_x ∈ [ roman_ℓ ] and regions (,2),[,)2(-\infty,2),[\ell,\infty)( - ∞ , 2 ) , [ roman_ℓ , ∞ ).

Now, we consider a good region R=[x,x+1)𝑅𝑥𝑥1R=[x,x+1)italic_R = [ italic_x , italic_x + 1 ). The saved interval of A𝐴Aitalic_A inside R𝑅Ritalic_R is [x+0.5,x+0.54]𝑥0.5𝑥0.54[x+0.5,x+0.54][ italic_x + 0.5 , italic_x + 0.54 ].

When the stream B𝐵Bitalic_B arrives, the intervals [x0.1,x+0.26]𝑥0.1𝑥0.26[x-0.1,x+0.26][ italic_x - 0.1 , italic_x + 0.26 ] and [x+0.9,x+1.1]𝑥0.9𝑥1.1[x+0.9,x+1.1][ italic_x + 0.9 , italic_x + 1.1 ] cross the boundaries of R𝑅Ritalic_R. The interval [x+0.53,x+0.71]B𝑥0.53𝑥0.71𝐵[x+0.53,x+0.71]\in B[ italic_x + 0.53 , italic_x + 0.71 ] ∈ italic_B intersects with the interval [x+0.5,x+0.54]A𝑥0.5𝑥0.54𝐴[x+0.5,x+0.54]\in A[ italic_x + 0.5 , italic_x + 0.54 ] ∈ italic_A, so only the rightmost of the region is changed after the stream B𝐵Bitalic_B is processed.

Since no new regions are created by B𝐵Bitalic_B, we can argue that 𝒞𝒫(AB)=𝒞𝒫𝐴𝐵\mathcal{CP}(AB)=\ellcaligraphic_C caligraphic_P ( italic_A italic_B ) = roman_ℓ (the number of regions created by 𝒞𝒫(A)𝒞𝒫𝐴\mathcal{CP}(A)caligraphic_C caligraphic_P ( italic_A )). Furthermore, all the intervals of B𝐵Bitalic_B are pairwise independent so that 𝒞𝒫(B)=|B|=𝒞𝒫𝐵𝐵\mathcal{CP}(B)=|B|=\ellcaligraphic_C caligraphic_P ( italic_B ) = | italic_B | = roman_ℓ, hence proving the required lemma.

Lemma 6.

Let OPT𝑂𝑃𝑇OPTitalic_O italic_P italic_T be an optimal independent set of stream ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C. Then, |OPT|113𝑂𝑃𝑇113|OPT|\geq\frac{11\ell}{3}| italic_O italic_P italic_T | ≥ divide start_ARG 11 roman_ℓ end_ARG start_ARG 3 end_ARG.

Proof.

Inspecting the intervals given by the streams A2,A3,C2subscript𝐴2subscript𝐴3subscript𝐶2A_{2},A_{3},C_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we see that they form an independent set. We have

  • |A2|=subscript𝐴2|A_{2}|=\ell| italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | = roman_ℓ,

  • |A3|=subscript𝐴3|A_{3}|=\ell| italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | = roman_ℓ, and

  • |C2|=53subscript𝐶253|C_{2}|=5\cdot\frac{\ell}{3}| italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | = 5 ⋅ divide start_ARG roman_ℓ end_ARG start_ARG 3 end_ARG.

Hence, we obtain that |A2A3C2|=113subscript𝐴2subscript𝐴3subscript𝐶2113|A_{2}\cup A_{3}\cup C_{2}|=\frac{11\ell}{3}| italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∪ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∪ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | = divide start_ARG 11 roman_ℓ end_ARG start_ARG 3 end_ARG, which implies |OPT||A2A3C2|=113𝑂𝑃𝑇subscript𝐴2subscript𝐴3subscript𝐶2113|OPT|\geq|A_{2}\cup A_{3}\cup C_{2}|=\frac{11\ell}{3}| italic_O italic_P italic_T | ≥ | italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∪ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∪ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | = divide start_ARG 11 roman_ℓ end_ARG start_ARG 3 end_ARG as required. ∎

Lemma 7.

The naive smooth histogram approach outputs a solution of size 𝒞𝒫(BC)=𝒞𝒫𝐵𝐶\mathcal{CP}(BC)=\ellcaligraphic_C caligraphic_P ( italic_B italic_C ) = roman_ℓ.

Proof.

Recall that all intervals in B𝐵Bitalic_B and C𝐶Citalic_C are inserted only into good regions or at the boundary of good regions. Let [x,x+1)𝑥𝑥1[x,x+1)[ italic_x , italic_x + 1 ) be a good region (i.e x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2 for integer k𝑘kitalic_k). We will show that |𝒞𝒫(BC)[x1,x+2]|=3𝒞𝒫𝐵𝐶𝑥1𝑥23|\mathcal{CP}(BC)\cap[x-1,x+2]|=3| caligraphic_C caligraphic_P ( italic_B italic_C ) ∩ [ italic_x - 1 , italic_x + 2 ] | = 3 (the good region and its neighbouring regions).

After processing the B𝐵Bitalic_B stream, we have region boundaries at x+0.26𝑥0.26x+0.26italic_x + 0.26, x+0.71𝑥0.71x+0.71italic_x + 0.71 and x+1.1𝑥1.1x+1.1italic_x + 1.1. Observe that all of the intervals of C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT cross the region boundaries at x+0.26𝑥0.26x+0.26italic_x + 0.26, and x+0.71𝑥0.71x+0.71italic_x + 0.71, so they do not get saved by the run of 𝒞𝒫𝒞𝒫\mathcal{CP}caligraphic_C caligraphic_P. Furthermore, we have that the interval [x+0.25,x+0.35]C2𝑥0.25𝑥0.35subscript𝐶2[x+0.25,x+0.35]\in C_{2}[ italic_x + 0.25 , italic_x + 0.35 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT crosses the region boundary at x+0.26𝑥0.26x+0.26italic_x + 0.26 while the interval [x+0.7,x+0.8]C2𝑥0.7𝑥0.8subscript𝐶2[x+0.7,x+0.8]\in C_{2}[ italic_x + 0.7 , italic_x + 0.8 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT crosses the boundary at x+0.71𝑥0.71x+0.71italic_x + 0.71, so these intervals also do not get saved.

When processing C𝐶Citalic_C, however, the intervals [x+0.06,x+0.15],[x+0.55,x+0.6],[x+0.9,x+0.94]C2𝑥0.06𝑥0.15𝑥0.55𝑥0.6𝑥0.9𝑥0.94subscript𝐶2[x+0.06,x+0.15],[x+0.55,x+0.6],[x+0.9,x+0.94]\in C_{2}[ italic_x + 0.06 , italic_x + 0.15 ] , [ italic_x + 0.55 , italic_x + 0.6 ] , [ italic_x + 0.9 , italic_x + 0.94 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT get inserted into the solution. Additionally, they do not change the structure of the regions created by the run 𝒞𝒫(B)𝒞𝒫𝐵\mathcal{CP}(B)caligraphic_C caligraphic_P ( italic_B ) (i.e they only modify the leftmost or the rightmost interval of each region created by B𝐵Bitalic_B).

So, |𝒞𝒫(BC)[x1,x+2]|=3𝒞𝒫𝐵𝐶𝑥1𝑥23|\mathcal{CP}(BC)\cap[x-1,x+2]|=3| caligraphic_C caligraphic_P ( italic_B italic_C ) ∩ [ italic_x - 1 , italic_x + 2 ] | = 3. Because there are /33\ell/3roman_ℓ / 3 good regions and the intervals [x1,x+2]𝑥1𝑥2[x-1,x+2][ italic_x - 1 , italic_x + 2 ] where x=3k+2𝑥3𝑘2x=3k+2italic_x = 3 italic_k + 2 do not pairwise intersect, we have |𝒞𝒫(BC)|=𝒞𝒫𝐵𝐶|\mathcal{CP}(BC)|=\ell| caligraphic_C caligraphic_P ( italic_B italic_C ) | = roman_ℓ as required.

Lemma 8.

Steps 1111 and 2222 of Algorithm 4 output a solution of size \ellroman_ℓ.

Proof.

First, observe that steps 1 and 2 of Algorithm 4 are run on substream C𝐶Citalic_C. Furthermore, since only good regions contain intervals in substream C𝐶Citalic_C, it suffices to explore how steps 1 and 2 act on good regions.

In each good region, the stream C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is responsible for creating the regions of the runs of Algorithm 4. Observe that the intervals [x+0.25,x+0.35]C2𝑥0.25𝑥0.35subscript𝐶2[x+0.25,x+0.35]\in C_{2}[ italic_x + 0.25 , italic_x + 0.35 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and [x+0.7,x+0.8]C2𝑥0.7𝑥0.8subscript𝐶2[x+0.7,x+0.8]\in C_{2}[ italic_x + 0.7 , italic_x + 0.8 ] ∈ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT cross the boundaries of these regions and are thus not stored by the algorithm. In each good region, only the intervals [x+0.06,x+0.15],[x+0.55,x+0.6],[x+0.9,x+0.94]𝑥0.06𝑥0.15𝑥0.55𝑥0.6𝑥0.9𝑥0.94[x+0.06,x+0.15],[x+0.55,x+0.6],[x+0.9,x+0.94][ italic_x + 0.06 , italic_x + 0.15 ] , [ italic_x + 0.55 , italic_x + 0.6 ] , [ italic_x + 0.9 , italic_x + 0.94 ] of C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT get memorized. Therefore, we obtain a solution of size 3333 for each good region. Overall, the obtained solution by the runs of steps 1 and 2 of Algorithm 4 is of size 33=333\cdot\frac{\ell}{3}=\ell3 ⋅ divide start_ARG roman_ℓ end_ARG start_ARG 3 end_ARG = roman_ℓ.

Using the last two lemmas, we obtain the following conclusion:

Theorem 8.

Let S𝑆Sitalic_S be the size of the solution output by Algorithm 4 on the described input. Then,OPT(ABC)S113𝑂𝑃𝑇𝐴𝐵𝐶𝑆113\frac{OPT(ABC)}{S}\geq\frac{11}{3}divide start_ARG italic_O italic_P italic_T ( italic_A italic_B italic_C ) end_ARG start_ARG italic_S end_ARG ≥ divide start_ARG 11 end_ARG start_ARG 3 end_ARG.

Proof.

By the previous two lemmas, both 𝒞𝒫(BC)𝒞𝒫𝐵𝐶\mathcal{CP}(BC)caligraphic_C caligraphic_P ( italic_B italic_C ) and steps 1 and 2 of Algorithm 4 output a solution of size \ellroman_ℓ. Notice that the set of saved intervals of steps 1 and 2 of Algorithm 4 is a subset of the saved intervals of 𝒞𝒫(BC)𝒞𝒫𝐵𝐶\mathcal{CP}(BC)caligraphic_C caligraphic_P ( italic_B italic_C ), therefore we cannot improve the overall solution by combining both solutions. So, S=𝑆S=\ellitalic_S = roman_ℓ.

By Lemma 6, OPT(ABC)113𝑂𝑃𝑇𝐴𝐵𝐶113OPT(ABC)\geq\frac{11\ell}{3}italic_O italic_P italic_T ( italic_A italic_B italic_C ) ≥ divide start_ARG 11 roman_ℓ end_ARG start_ARG 3 end_ARG. So, we get the required conclusion. ∎