Interval Selection in Sliding Windows
Abstract
We initiate the study of the Interval Selection problem in the (streaming) sliding window model of computation. In this problem, an algorithm receives a potentially infinite stream of intervals on the line, and the objective is to maintain at every moment an approximation to a largest possible subset of disjoint intervals among the most recent intervals, for some integer .
We give the following results:
-
1.
In the unit-length intervals case, we give a -approximation sliding window algorithm with space , and we show that any sliding window algorithm that computes a -approximation requires space , for any .
-
2.
In the arbitrary-length case, we give a -approximation sliding window algorithm with space , for any constant , which constitutes our main result.333We use the notation to mean where factors and dependencies on are suppressed. We also show that space is needed for algorithms that compute a -approximation, for any .
Our main technical contribution is an improvement over the smooth histogram technique, which consists of running independent copies of a traditional streaming algorithm with different start times. By employing the one-pass -approximation streaming algorithm by Cabello and Pérez-Lantero [Theor. Comput. Sci. ’17] for Interval Selection on arbitrary-length intervals as the underlying algorithm, the smooth histogram technique immediately yields a -approximation in this setting. Our improvement is obtained by forwarding the structure of the intervals identified in a run to the subsequent run, which constrains the shape of an optimal solution and allows us to target optimal intervals differently.
1 Introduction
Sliding Window Model
The sliding window model of computation introduced by Datar et al. [9] captures many of the challenges that arise when processing infinite data streams. In this model, an algorithm receives an infinite stream of data items and is required to maintain, at every moment, a solution to a given problem on the current sliding window, i.e., on the most recent data items, for an integer . The objective is to design algorithms that use much less space than the size of the sliding window .
Many modern data sources are best modelled as infinite data streams rather than as data sets of large but finite sizes. For example, the sequence of Tweets on X (formerly Twitter), the sequence of IP packages passing through a network router, and continuous sensor measurements for monitoring the physical world are a priori unending. Such data sets typically constitute time-series data, where the resulting data stream is ordered with respect to the data items’ creation times. When processing such streams, it is reasonable to focus on the most recent data items (as it is modelled in the sliding window model by the sliding window size ) since the near past usually affects the present more strongly than older data.
The sliding window model should be contrasted with the more traditional one-pass data streaming model. In the data streaming model, an algorithm processes a finite stream of data items and is tasked with producing a single output once all items have been processed. Similar to the sliding window model, the objective is to design algorithms that use as little space as possible, in particular, sublinear in the length of the stream. Since sliding window algorithms with can immediately be used in the data streaming model, problems are generally harder to solve in the sliding window model.
Interval Selection Problem
In this work, we initiate the study of the Interval Selection problem in the sliding window model. Given a set of intervals on the real line, the objective is to find a subset of pairwise non-overlap** intervals of maximum cardinality. The problem can also be regarded as the Maximum Independent Set problem in the interval graph associated with the intervals . We consider both the unit-length case, where all intervals are of length , and the arbitrary-length case, where no restriction on the lengths of the intervals is imposed.
Interval Selection is fully understood in the one-pass streaming model. Emek et al. [10] gave a -approximation streaming algorithm for unit-length intervals and a -approximation streaming algorithm for arbitrary-length intervals. Both algorithms use space , where denotes an optimal solution, assuming that the space required for storing an interval is . Emek et al. also gave matching lower bounds, showing that, for both the unit-length and the arbitrary-length case, slightly better approximations require space . Subsequently, Cabello and Pérez-Lantero [5] also gave algorithms for the unit-length and the arbitrary-length cases that match the guarantees of those by Emek et al. but are significantly simpler. We will reuse one of the algorithms by Cabello and Pérez-Lantero in this paper. Last, weighted intervals as well as the insertion-deletion setting, where previously inserted intervals can be deleted again, have also been considered [8, 2], where [2] addresses the challenge of outputting the size or weight of a largest/heaviest independent set rather than outputting the intervals themselves.
The Smooth Histogram Technique
Braverman and Ostrovsky [4] introduced the smooth histogram technique, which allows deriving sliding window algorithms from traditional streaming algorithms at the expense of slightly increased space requirements and approximation guarantees. The method works as follows. Given a streaming algorithm for a specific problem P that fulfills certain smoothness properties (see [4] for details), multiple copies of are run with different starting positions in the stream. The runs are such that consecutive runs differ only slightly in solution quality, and, thus, when a run expires due to the fact that its starting position fell out of the current sliding window, the subsequent run can be used to still yield an acceptable solution. The smooth histogram technique can be applied to the Interval Selection algorithms by Emek et al. [10] and by Cabello and Pérez-Lantero [5], and we immediately obtain sliding window algorithms for both the unit-length and the arbitrary-length cases using space 444We use the notation to mean where factors and dependencies on are suppressed.. For unit-length intervals, the resulting approximation factor is , for any , and for arbitrary-length intervals, the approximation factor is , for any . We will provide the analysis of the -approximation for arbitrary-length intervals in this paper (Theorem 5) since it forms the basis of the analysis of one of our algorithms.
Our Results
In this work, we show that it is possible to improve upon the guarantees obtained from the smooth histogram technique. We give deterministic sliding window algorithms and lower bounds that also apply to randomized algorithms for Interval Selection for both the unit-length and arbitrary-length cases. Our algorithms use space at any moment during the processing of the stream, where denotes an optimal solution in the current sliding window. Observe that may vary throughout the processing of the stream, and, thus, the space used by our algorithms may therefore also change accordingly.
Regarding unit-length intervals, we give a -approximation sliding window algorithm using space, and we prove that any sliding window algorithm with an approximation guarantee of , for any , requires space . Recall that, in the streaming model, a -approximation can be achieved with space . Our lower bound thus establishes a separation between the sliding window and the streaming models for unit-length intervals.
In the arbitrary-length case, we give a -approximation sliding window algorithm with space , improving over the smooth histogram technique, which constitutes our main and most technical result. We also prove that any -approximation algorithm, for any , requires space . Since, in the streaming model, a -approximation can be achieved with space , our lower bound also establishes a separation between the sliding window and the streaming models in the arbitrary-length case.
We summarize and contrast our results with results from the streaming model in Figure 1.
Streaming model [10, 5] | Sliding window model (this paper) | |||
---|---|---|---|---|
Algorithm | LB | Algorithm | LB | |
Unit-length Intervals | (Thm 3) | (Thm 4) | ||
Arbitrary-length Intervals | (Thm 6) | (Thm 7) |
A Lack of Lower Bounds in the Sliding Window Model
Interestingly, to the best of our knowledge, for graph problems (recall that the interval selection problem is an independent set problem on interval graphs) no separation result between the one-pass streaming and the sliding window models are known. In particular, we are not aware of any space lower bounds for graph problems specifically designed for the sliding window setting, and the only lower bounds that apply are those that carry over from the one-pass streaming setting. Our work is thus the first to establish such a separation. While our results for arbitrary-length intervals are not tight, we stress that for most problems considered, including Maximum Matching and Minimum Vertex Cover, no tight bounds are known. It is unclear whether this is due to a lack of techniques for improved algorithms or for stronger lower bounds.
Techniques
We will first discuss the key ideas behind our results for unit-length intervals, and then discuss our results for arbitrary-length intervals.
Unit-length Intervals. Our algorithm for unit-length intervals is surprisingly simple yet optimal, as established by our lower bound result. For each integer , maintain the latest interval within the current sliding window whose left endpoint lies in the interval if there is one. We argue that, if at any moment, the algorithm stores intervals, then we can extract an independent set of size at least by considering either only the intervals where is odd or where is even, while is bounded by , which establishes both the approximation factor of and the space requirements. We note that the idea of considering either only the odd or even intervals for obtaining a -approximation was previously used by [2].
Our lower bound for unit-length intervals is obtained by a reduction to the problem in the one-way two-party communication setting. In this setting, there are two parties, denoted Alice and Bob. Each party holds a portion of the input data. Alice sends a single message to Bob, who then outputs the result of the computation. The objective is to solve a problem using a message of smallest possible size. In , Alice holds a bit-string , and Bob holds an index , where , and the objective for Bob is to report the bit . It is well-known that a message of size is needed to solve the problem.
We argue that a sliding window algorithm for Interval Selection on unit-length intervals with approximation guarantee slightly below can be used to solve . To this end, Alice translates the bit-string into a clique gadget, i.e., a stack of overlap** interval slots that are slightly shifted from left-to-right, where interval is present in the stack if and only if . Clique gadgets have been used in all previous space lower bound constructions for intervals [10, 2, 8]. Alice then runs on these intervals and sends the memory state of to Bob. Bob subsequently feeds an interval located slightly to the right of the slot of interval into the execution of such that Bob’s interval overlaps with all interval slots at positions and does not overlap with all interval slots at positions . The key idea of this reduction is that, since is a sliding window algorithm, it must be able to report a valid solution even if any prefix of intervals of the stack are deleted/have expired. Consider thus the situation when the intervals that are located in the first slots have expired. Then, the resulting instance has an independent set of size if and only if , otherwise a largest independent set is of size . Since the approximation factor of is below , can thus distinguish between the two cases and solve . Since Alice only sent the memory state of to Bob, we also obtain a space lower bound for . While this description covers the key idea of our lower bound, we note that our actual construction is slightly more involved due to an additional technical challenge. See proof of Theorem 4 for details.
Our lower bound construction shares similarities with the lower bounds by [2] and [10], as both of these lower bounds also work with clique gadgets and special intervals that render a specific interval in the clique gadget important. In [2], a reduction to the Augmented-Index problem is given in order to obtain a space lower bound for the dynamic streaming setting, where previously inserted intervals can be deleted again at any moment. In Augmented-Index, besides the index , Bob also holds the prefix . While in our setting, intervals are deleted due to the shifting sliding window, in their lower bound, intervals are explicitly deleted by Bob.
Arbitrary-length Intervals. Our algorithm and our lower bound for arbitrary-length intervals are substantially more involved, and our -approximation algorithm constitutes the main technical result of this paper.
Our algorithm constitutes an improvement over the smooth histogram method. Using the one-pass -approximation streaming algorithm for arbitrary-length intervals by Cabello and Pérez-Lantero [5], which we abbreviate by , as the base algorithm of the smooth histogram method, we immediately obtain a -approximation sliding window algorithm using space. The key idea of the method is to maintain various runs of with different starting times that are sufficiently spaced out so that only a logarithmic number of runs are needed, yet adjacent runs still have similar output sizes. Then, when a run expires, the subsequent run can still be used to report a good enough solution.
We observe that the executions of in the smooth histogram method are independent. Our key contribution that gives rise to our improvement is to forward the structure identified in a run of to the subsequent run. The algorithm, which we will discuss in detail in Section 4.1.1, maintains a partition of the real line that restrains the possible locations of optimal intervals that are yet to arrive in the stream. We target these locations individually in the subsequent run by initiating additional runs of on restricted domains where we expect to find many of these optimal intervals.
Our approach relies on a property of the algorithm that, at first glance, seems relatively insignificant. As proved by Cabello and Pérez-Lantero, the algorithm produces a solution of size at least , and thus only has an approximation factor of in an asymptotic sense. Consequently, if is a small constant then the algorithm achieves an approximation factor strictly below . We exploit this property in that we execute the additional runs of on small domains where we expect to find only a small constant number of optimal intervals, see Section 4.1.3 for further details.
Our -approximation lower bound for arbitrary-length intervals is also achieved via a reduction to a hard problem in one-way communication complexity. However, instead of exploiting the hardness of the two-party problem Index as in the unit-lengths case, we use the three-party problem introduced by Cormode et al. [6] instead. In , the first two parties and the last two parties hold separate Index instances that are correlated in that they have the same answer bit, i.e., , and the objective for the third party is to determine the bit . also requires a message of size to be solved. Similar to the unit-length case, the first two parties introduce clique gadgets based on the bit-strings and , and the third party introduces additional crucial intervals. The strength of using is that, if the answer bit is zero, then the crucial intervals corresponding to and of all clique gadgets are missing, while if the answer bit is one then all of these intervals are present. The method thus allows us to work with multiple clique gadgets instead of only a single one, which we exploit to obtain a stronger lower bound. See Section 4.2 for details.
Further Related Work
Crouch et al. [7] initiated the study of graph problems in the sliding window model (recall that Interval Selection is an independent set problem on interval graphs). They showed that, similar to the streaming model, there exist sliding window algorithms that use space for deciding Connectivity and Bipartiteness, where is the number of vertices in the input graph. They also gave positive results for the computation of cut-sparsifiers, spanners and minimum spanning trees, and they initiated the study of the Maximum Matching problem in the sliding window model (see below).
The smooth histogram technique has been successfully applied for designing sliding window algorithms for graph problems, and the state-of-the-art sliding window algorithms for Maximum Matching and Minimum Vertex Cover rely on the smooth histogram technique.
For Maximum Matching, a -approximation with space can easily be achieved in the streaming model by running the Greedy matching algorithm, and the smooth histogram method immediately yields a -approximation sliding window algorithm when built on Greedy. Crouch et al. [7] observed that the resulting algorithm can be analyzed more precisely and showed that it actually yields a -approximation sliding window algorithm. Regarding the weighted version of the Maximum Matching problem, the smooth histogram technique immediately yields a -approximation using the -approximation streaming algorithm by [15], and, again, as proved by Biabani et al. [3], the analysis can be tailored to the Maximum Matching problem to establish an approximation factor of without changing the algorithm. Alexandru et al. [1] then improved the approximation factor to by running the smooth histogram algorithm with a slightly different objective function.
Outline
In Section 2, we give notation, provide some clarification on the sliding window model, and introduce hard communication problems that we rely on for proving our lower bound results. Then, in Section 3, we give our algorithm and lower bound for the case of unit-length intervals, and in Section 4, we give our algorithm and lower bound for arbitrary-length intervals. Finally, we conclude in Section 5 with open problems.
2 Preliminaries
For a set of intervals , we denote by an independent subset of of maximum size. We also apply to substreams of intervals and to data structures that store intervals.
Sliding Window Algorithms
Throughout the document, we denote by the size of the sliding window, and we assume that is large enough, i.e., larger than a suitably large constant. For two streams of intervals we denote the stream that is obtained by concatenating and simply by , i.e., we omit a concatenation symbol. Furthermore, for simplicity, we assume that the space required to store an interval is . However, if instead bits are accounted for storing an interval then the space complexities of our algorithms need to be multiplied by .
Communication Complexity
As it is standard in the data streaming literature, our space lower bounds are proved via reductions to problems in the one-way communication setting. In this setting, multiple parties each hold a portion of the input data and communicate in order to solve a problem. Communication is one-way, i.e., party sends a message to , who in turn sends a message to . This continues until party has received a message from party and then outputs the result of the computation. The parties can make use of public and private randomness and need to report a correct solution with probability . We refer the reader to [14] for an introduction to communication complexity.
We will exploit the hardness of the two-party communication problem , where we denote the first party by Alice and the second party by Bob, and the -party communication problem , which was recently introduced by Cormode et al. [6].
: • Input: Alice holds a bit-string , and Bob holds an index . • Output: Bob outputs .
It is well-known that solving requires Alice to send a message of size .
Theorem 1 (e.g. [12]).
Every randomized constant-error one-way communication protocol for requires a message of size .
The problem can be regarded as chaining together instances of , where the instances are correlated in that they are guaranteed to have the same output.
Chain: • Input: For , player receives a bitvector . Additionally, for any player receives an index . The inputs are correlated such that • Output: Player outputs .
Sundaresan [17] recently settled the communication complexity of , improving over the previous lower bounds by Cormode et al. [6] and Feldman et al. [11]:
Theorem 2 ([17]).
Every constant-error one-way communication protocol that solves requires at least one message of size .
3 Unit-length Intervals
In this section, we give our sliding window algorithm (Subsection 3.1) and our lower bound (Subsection 3.2) for unit-length intervals.
3.1 Sliding Window Algorithm for Unit-length Intervals
We now describe our algorithm for unit-length intervals.
Our algorithm is simple: For each integer , the algorithm maintains in the latest interval of the current sliding window with its left boundary in . The key observation, which was also used in [2], is that the intervals and form independent sets, and one of these sets constitutes a -approximation.
Theorem 3.
Algorithm 1 is a deterministic -approximation sliding window algorithm for Interval Selection on unit-length intervals that, at any moment, uses space, where is a maximum independent set of intervals in the current sliding window.
Proof.
We will first prove that Algorithm 1 indeed computes a -approximation, and then argue that the algorithm satisfies the memory requirements.
We call a unit-length interval active if it is included in the current sliding window (one of the most recent intervals of the stream). Otherwise, we say that is expired.
Let be a maximum independent set in the current sliding window and let be the independent set reported by Algorithm 1. Define the indexed set latest as in the algorithm.
Approximation
We will show that
(1) |
holds, which then establishes the approximation factor of the sliding window algorithm.
First, we will prove holds. To this end, we will show that the function defined as is injective.
We will first argue that is well-defined in that exists, for every . Indeed, by inspecting the algorithm, when arrives in the stream, is set to , and, in particular, while is active, is never set to . It may, however, happen that it is replaced with an interval which appeared after . In both cases, is well-defined.
To see that is injective, observe that for any two intervals in , since these intervals are independent and of unit-length, the integer parts of their left endpoints are distinct. Hence, , for any two distinct intervals .
Since is well-defined and injective, we obtain that , which thus proves the first inequality of Inequality 1. It remains to prove the second, i.e., that also holds.
To see this, observe that, for two integers of the same parity, and (if they exist) are independent. This is because and the intervals have unit-length. By the pigeonhole principle, there are at least intervals where their indices inside latest have the same parity, which implies that .
Space
The algorithm stores intervals in the current sliding window. Then, as proved above, we have , which implies that the space used by the algorithm is .
∎
3.2 Space Lower Bound
We now show that sliding window algorithms that use space cannot compute a -approximation to Interval Selection on unit-length intervals, for any . Recall that, in the streaming model, a -approximation can be computed with space .
Theorem 4.
Let be any small constant. Then, any algorithm in the sliding window model that computes a -approximate solution to Interval Selection on unit-length intervals with probability at least requires a memory of size .
Proof.
Let be a sliding window algorithm for Interval Selection on unit-length intervals with approximation factor , for some .
We will show how can be used in order to obtain a communication protocol for .
To this end, let be Alice and Bob’s input to . The two players proceed as follows:
-
•
Alice: Alice feeds the intervals into (in the given order), where
Alice then sends the memory state of to Bob.
-
•
Bob: Using Alice’s message, Bob continues the execution of and feeds the interval
into . Bob also adds the intervals to , for in order to make the sliding window of the algorithm advance. Bob computes ’s output in the latest sliding window consisting of the intervals defined as .
This construction is illustrated in Figure 2.
We observe that if then , while if then . Since has an approximation factor of , needs to report the unique solution of size if , and a solution of size when . Bob can thus distinguish between the two cases and solve .
Since the protocol solves , by Theorem 1, the protocol must use a message of size . The protocol’s message is ’s memory state, and, hence, must use space .
∎
4 Arbitrary-length Intervals
In this section, we give our -approximation sliding window algorithm and our -approximation lower bound for Interval Selection on arbitrary-length intervals.
4.1 -approximation Sliding Window Algorithm
Our algorithm is obtained by running multiple instances of the Cabello and Pérez-Lantero streaming algorithm for Interval Selection on arbitrary-length intervals [5]. In the following, we abbreviate the algorithm by . Since we employ various properties of the algorithm, we discuss the algorithm in Subsection 4.1.1. We use the algorithm in the context of the smooth histogram technique, which we discuss in Subsection 4.1.2. Finally, we give our sliding window algorithm and its analysis in Subsection 4.1.3.
4.1.1 Cabello and Pérez-Lantero Algorithm
For an interval , we define and .
The Cabello and Pérez-Lantero algorithm is depicted in Algorithm 2.
The listing of the algorithm uses the auxiliary functions and , which return the left and right delimiters of an interval, respectively.
The key idea behind the algorithm is to maintain a partition of the real line that we refer to as a region partition. Initially, the algorithm starts with the single region , and as the algorithm proceeds, the real line is partitioned into half-open intervals. This is achieved as follows. Arriving intervals that cross a region boundary are ignored. Consider thus an arriving interval that lies entirely within a region. In each region, the algorithm stores the left-most (the interval with the left-most right delimiter) and right-most (the interval with the right-most left delimiter) intervals within the region that it has observed thus far. If the interval together with either the left-most or the right-most interval of the region forms an independent set of size two then the region is split into two regions and the left-most and right-most intervals are updated accordingly. Otherwise, if intersects with both the left-most and right-most intervals of the region then is only used to potentially replace the left-most and/or right-most intervals of the region.
Some key properties of the algorithm that we will reuse in this work are summarized in Figure 3 (see [5] for proofs).
Besides these properties, we require another property that allows us to employ the algorithm in the context of the smooth histogram technique:
Lemma 1.
The algorithm is monotonic, i.e., for any two streams of intervals we have that
Proof.
The output produced by the algorithm consists of one interval per region. The lemma then follows since, by construction, the number of regions cannot decrease. ∎
4.1.2 The Smooth Histogram Technique
The algorithm can be employed in the context of the smooth histogram method to yield a -approximation sliding window algorithm for Interval Selection for arbitrary-length intervals that uses space . This is achieved as follows (see Algorithm 3):
Upon the arrival of a new interval, Algorithm 3 first creates a new run of the algorithm and feeds the new interval into all currently running copies of . The method relies on a clever way of deleting unnecessary runs: A run is deleted if it is expired (i.e it contains an interval which appeared before the start of the current region) and if the closest run with earlier start time and the closest run with later start time are such that their solutions differ in size by less than a factor. Consider the moment after a clean-up took place, and let us denote the stored runs by . The clean-up rule implies that the stored runs have the properties depicted in Figure 4.
Property S1 implies that there are at most active runs of and thus the space of Algorithm 3 is at most a factor larger than the space used by .
Property S2 implies that either consecutive runs differ by at most a factor in solution size or are such that their starting times differ by only a single interval.
We now provide a proof that allows us to see that Algorithm 3 is a -approximation algorithm for Interval Selection for arbitrary-length intervals. This proof will establish insight into how the analysis of our more involved -approximation algorithm is conducted and thus serves as a warm-up.
Theorem 5.
Algorithm 3 is a -approximation sliding window algorithm for Interval Selection on arbitrary-length intervals that uses space , where denotes an optimal solution in the current sliding window.
Proof.
Recall that the output of Algorithm 3 is the output of the oldest run of an instance of , and let us denote this run by . First, if the start position of coincides with the oldest interval of the current sliding window then was run on the entire region and we immediately obtain an approximation factor of (Property C2 of the algorithm). Hence, suppose that this is not the case, and denote the run that has expired most recently by . Observe that, prior to expiring, the runs and were adjacent. Furthermore, the two runs differ by more than one interval since otherwise the starting position of would coincide with the oldest interval of the current sliding window. We now consider the suffix of intervals in the stream starting at the start position of run . This suffix is partitioned into three parts , where are the intervals that arrived prior to the starting position of , are the intervals that arrived from the moment onward when the runs and became adjacent (either after a clean-up or they may have been adjacent from the moment onward when was created), and are the intervals between and (if there are any).
4.1.3 Sliding Window Algorithm
We will expand upon the smooth histogram method as described in Algorithm 4. The key idea is to exploit the structure of the regions created by the runs of in the smooth histogram algorithm. Based on these regions, we instantiate additional runs that target areas in which we expect to find many optimal intervals.
We will now proceed and analyse Algorithm 4. To this end, we consider any fixed current sliding window.
First, similar to the analysis of Algorithm 3, we note that if the starting position of the oldest run of , denoted , coincides with the left delimiter of the sliding window then we immediately obtain a -approximation (by Property C2). Suppose thus that this is not the case. Again, we consider the run , which is the latest run that has expired and was previously adjacent to . We also consider the suffix of intervals , where are the intervals starting at the starting position of and ending before the starting position of , are the intervals that occurred after and became adjacent, and are the remaining intervals. Let , let , and define and similarly. Since the current sliding window is a subset of , we have that an optimal solution in the current sliding window is of size at most .
In Algorithm 4, we run the smooth histogram algorithm, Algorithm 3, with respect to a parameter (i.e., replace parameter in the listing with ). Then, as proved in Theorem 5, we always have at least a -approximation at our disposal, i.e.,
We define such that
(2) |
In the following, we will argue that if is close to then we can find a better solution using the runs associated with .
Let be the regions created by at the moment when and became adjacent, i.e., the regions created by the run . By Property C2, we have . For each region , let , i.e., the number of optimal intervals in that lie within the region . Furthermore, we define .
In the next lemma, we prove that, provided is small, the quantity is necessarily large, i.e., there are many optimal intervals in that lie within the regions . We will later argue that the associated runs with can then be used to find many of these.
Lemma 2.
Proof.
Observe that , since at most intervals of can intersect the region boundaries , another intervals of can lie within the regions, and the remaining ones are the intervals of .
which implies the result. ∎
Consider now the run , which coincides with the run until and became adjacent. Let be the intervals computed by this run that do not intersect the boundaries of and let be the intervals computed by this run that intersect the boundaries of . Then, since , we have that either or . We treat both cases separately in Lemmas 3 and 4:
Lemma 3.
Suppose that . Then, using the associated runs of , we can output a solution of size at least
Proof.
We call a region good if it contains an interval from . We output the solution obtained from the runs of on , for all , and if such a run on a good region leads to no intervals (i.e. ), then we output the interval from instead. Recall that outputs a solution of size if (Property C2), and we stress here that the additive is key for our analysis. We thus obtain a solution of size at least:
Lemma 4.
Suppose that . Then, using the associated runs of , we can output a solution of size at least
Proof.
Let be the intervals of that lie on the boundary of two regions where is even, and let . Then, either or .
Suppose that . We only analyse this case since the other case is similar.
We call an even index good if there is an interval in that lies on . We consider the runs of on pairs of regions where is even. Then, we find a solution of size:
Using the identity and that is the number of good regions and proceeding as in the proof of Lemma 3, we obtain:
∎
Theorem 6.
For any constant , Algorithm 4 is a -approximation sliding window algorithm for Interval Selection on arbitrary-length intervals that uses space .
Proof.
The naive smooth histogram method gives us a solution of size
where we used the monotonicity of (Lemma 1) and Property S2. Using the associated runs, by Lemmas 3 and 4, we get a solution of size at least
Since we can output the larger of the two solutions, in the worst case both solutions have the same value, i.e., when:
which implies . The approximation factor thus is for any :
Choosing , and rescaling to gives the result.
As a consequence of Property S1, as previously established, the smooth histogram algorithm uses space. It remains to argue that the runs created in Steps 1 and 2 of Algorithm 4 only increase the space requirements by a constant times .
Indeed, for a fixed instance , all the runs created by Step 1 are pairwise disjoint (they do not store common intervals) so their cumulative space is as we assumed the memory required to store an interval is . Similarly, for the runs created by Step 2, an interval appears in at most two such runs. So, the cumulative space is again . Therefore, the total number of intervals stored in the associated runs is at most , completing the proof. ∎
The proof of the approximation factor of the algorithm is shown to be tight in Appendix A, meaning that the algorithm does not beat the approximation factor of .
4.2 Space Lower Bound
We now give our space lower bound for sliding window algorithms for Interval Selection on arbitrary-length intervals. Our result is established by a reduction to the three-party communication problem .
Theorem 7.
Let be any small constant. Then, any algorithm in the sliding window model that computes a -approximate solution to Interval Selection on arbitrary-length intervals requires a memory of size .
Proof.
Let be a sliding window algorithm with approximation factor , for some , as in the statement of the theorem, and let , where is the window length. We will argue how can be solved with the help of .
To this end, denote the three parties in the communication problem by Alice, Bob, and Charlie. Let be Alice’s input, let and be Bob’s input, and let be Charlie’s input. The players proceed as follows:
-
•
Alice: For every , Alice feeds the following intervals into :
The order given is . We observe that, for every , when , the intervals and are disjoint. Alice sends the memory state of to Bob.
-
•
Bob: For every , Bob feeds the following interval into :
Let . Notice that, for every , when , we have that is disjoint with both and if and only if . Otherwise, intersects with both and .
Bob sends the memory state of and to Charlie.
-
•
Charlie: We denote the interval boundaries of by and , i.e., . Charlie feeds the following two intervals into :
Notice that intersects all intervals of , for all , while intersects all intervals of , for all .
Using , Charlie computes the largest independent set of
which is possible since is a sliding window algorithm and thus able to solve the situation when the intervals have expired.
The total number of intervals added by the three players is . So, after Charlie’s execution , the incumbent region indeed consists of .
We will argue now that if then the optimal solution size is , while if then the optimal solution size is .
Suppose thus that . Then it is not hard to see that the unique optimal solution is of size .
Next, suppose that . Notice first that, in this case, intersect with every other interval in the input, so they can only belong to independent sets of size at most .
Also, we have that any interval with would block all the intervals for and . So, an interval from with can be included in an optimal set of size at most (either or for some ). Similarly, with can be included in an optimal set of size at most . Furthermore, we can construct from Bob and Charlie’s input a solution of size at most (similar to the lower bound construction of [10]). The size of an optimal solution is thus in this case .
Recall that has an approximation factor of . Hence, if then reports a solution of size at least , thereby distinguishing it from the case when , which yields an optimal size of .
Since, by Theorem 2, requires a message of size , and since the protocol solely consists of forwarding the memory state of , we conclude that requires a memory of size , which completes the proof. ∎
5 Conclusion
In this paper, we initiated the study of the Interval Selection problem in the sliding window model of computation. We gave algorithms and lower bounds for both unit-length and arbitrary-length intervals. In the unit-length case, we gave a -approximation algorithm that uses space , and we showed that this is best possible in that any -approximation algorithm requires space . In the arbitrary-length case, we gave a -approximation algorithm that uses space , and we showed that any -approximation algorithm requires space . Contrasted with results known from the one-pass streaming setting, our result implies that Interval Selection in both the unit-length and the arbitrary-length cases is harder to solve in the sliding window setting than in the one-pass streaming setting.
We conclude with two open questions.
First, the approximation guarantees of our algorithm for arbitrary-length intervals and our respective lower bound do not match. Can we close this gap?
Second, the sliding window model has received significantly less attention for the study of graph problems than the traditional one-pass streaming setting. While from a theoretical perspective, the sliding window model is less clean than the one-pass streaming model, as discussed in the introduction, it is, however, the more suitable model for many applications. We are particularly interested in understanding the differences between the two models. For example, which graph problems can be solved equally well in the sliding window model as in the one-pass streaming setting, and which problems are significantly harder to solve?
References
- [1] Cezar-Mihail Alexandru, Pavel Dvorák, Christian Konrad, and Kheeran K. Naidu. Improved weighted matching in the sliding window model. In Petra Berenbrink, Patricia Bouyer, Anuj Dawar, and Mamadou Moustapha Kanté, editors, 40th International Symposium on Theoretical Aspects of Computer Science, STACS 2023, March 7-9, 2023, Hamburg, Germany, volume 254 of LIPIcs, pages 6:1–6:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.STACS.2023.6.
- [2] Ainesh Bakshi, Nadiia Chepurko, and David P. Woodruff. Weighted maximum independent set of geometric objects in turnstile streams. In International Workshop and International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2019. URL: https://api.semanticscholar.org/CorpusID:67856291.
- [3] Leyla Biabani, Mark de Berg, and Morteza Monemizadeh. Maximum-weight matching in sliding windows and beyond. 2021. URL: https://api.semanticscholar.org/CorpusID:245276580.
- [4] Vladimir Braverman and Rafail Ostrovsky. Smooth histograms for sliding windows. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 283–293. IEEE Computer Society, 2007. doi:10.1109/FOCS.2007.55.
- [5] Sergio Cabello and Pablo Pérez-Lantero. Interval selection in the streaming model. Theor. Comput. Sci., 702:77–96, 2017. doi:10.1016/j.tcs.2017.08.015.
- [6] Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. ArXiv, abs/1807.08331, 2018. URL: https://api.semanticscholar.org/CorpusID:49907556.
- [7] Michael S. Crouch, Andrew McGregor, and Daniel M. Stubbs. Dynamic graphs in the sliding-window model. In Hans L. Bodlaender and Giuseppe F. Italiano, editors, Algorithms - ESA 2013 - 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings, volume 8125 of Lecture Notes in Computer Science, pages 337–348. Springer, 2013. doi:10.1007/978-3-642-40450-4\_29.
- [8] Jacques Dark, Adithya Diddapur, and Christian Konrad. Interval selection in data streams: Weighted intervals and the insertion-deletion setting. In Foundations of Software Technology and Theoretical Computer Science, 2023. URL: https://api.semanticscholar.org/CorpusID:266192962.
- [9] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’02, page 635–644, USA, 2002. Society for Industrial and Applied Mathematics.
- [10] Yuval Emek, Magnús M. Halldórsson, and Adi Rosén. Space-constrained interval selection. ACM Trans. Algorithms, 12(4):51:1–51:32, 2016. doi:10.1145/2886102.
- [11] Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 1363–1374, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3357713.3384286.
- [12] T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory Comput., 4:129–135, 2008. URL: https://api.semanticscholar.org/CorpusID:15825208.
- [13] Robert Krauthgamer and David Reitblat. Almost-smooth histograms and sliding-window graph algorithms. Algorithmica, 84(10):2926–2953, 2022. doi:10.1007/s00453-022-00988-y.
- [14] Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997.
- [15] Ami Paz and Gregory Schwartzman. A (2+)-approximation for maximum weight matching in the semi-streaming model. ACM Trans. Algorithms, 15(2):18:1–18:15, 2019. doi:10.1145/3274668.
- [16] Sai Krishna Chaitanya Nalam Venkata Subrahmanya. Vertex cover in the sliding window model. Master’s thesis, Rutgers, The State University of New Jersey, 2021.
- [17] Janani Sundaresan. Optimal communication complexity of chained index, 2024. arXiv:2404.07026.
Appendix A Hard Instance for the Analysis of Algorithm 4
A.1 Description of the Instance
We will present a hard instance demonstrating that the analysis of Algorithm 4 is tight. We assume that the smooth histogram parameter is set to . This is a reasonable assumption since the approximation factor of the algorithm approaches the optimal value of when .
As before, let be the oldest still active instance created by the smooth histogram algorithm, and let be the expired instance which came before . Given and , we can divide the active stream into successive parts . represents the intervals that arrived before the starting position of . represents the intervals that arrived right after the runs and became adjacent (i.e after all the instances between and are deleted). The intervals arriving after but before are denoted as .
The smooth histogram condition then translates to .
Let be a positive integer divisible by 3. We will first give the full stream in Algorithm 5 and then explain the purpose of each portion of the stream.
We call the created regions with , for some integer , good. Notice that the number of regions created by is while the number of good regions is exactly because is divisible by 3.
-
•
Stream
Stream has three parts in this order: .
-
–
Stream
This stream is responsible for creating the regions for any integer and regions .
-
–
Stream
This stream inserts intervals inside the regions . Notice that the interval is completely inside the region and replaces the interval as both the leftmost and the rightmost intervals of the region.
-
–
Stream
This stream inserts intervals that intersect the boundaries of regions created by . The stream contributes to the size of the optimal solution of the overall stream .
-
–
-
•
Stream
If we execute stream immediately after stream , the regions created in will remain unchanged. In , we only add items inside or intersecting the good regions. Consider therefore a good region .
The intervals are intervals crossing the boundary of . They completely include the boundary intervals of (i.e from the previous region and ). The interval is an interval inside that intersects the interval .
-
•
Stream
The stream is divided into and only adds intervals completely included within good regions. Consider therefore a good region .
-
–
Stream
The purpose of the stream is to create new regions from the regions of the run . The new regions created inside are and .
-
–
Stream
The stream contributes to of our instance, where is an independent set of optimal size inside the stream . The interval intersects the interval , but it does not intersect the interval . The interval has similar properties. The intervals and intersect with the boundaries of the regions of , so they will not be saved by the algorithm. Lastly, the interval does not intersect with , but it intersects with .
-
–
A.2 Analysis of the Instance
Here we will prove that the output is indeed a approximation of the optimal solution, therefore proving that our analysis of Algorithm 4 is best possible.
Lemma 5.
The streams yield , hence the smooth histogram condition is obeyed.
Proof.
Notice that after the run of stream , we have created the regions for and regions .
Now, we consider a good region . The saved interval of inside is .
When the stream arrives, the intervals and cross the boundaries of . The interval intersects with the interval , so only the rightmost of the region is changed after the stream is processed.
Since no new regions are created by , we can argue that (the number of regions created by ). Furthermore, all the intervals of are pairwise independent so that , hence proving the required lemma.
∎
Lemma 6.
Let be an optimal independent set of stream . Then, .
Proof.
Inspecting the intervals given by the streams , we see that they form an independent set. We have
-
•
,
-
•
, and
-
•
.
Hence, we obtain that , which implies as required. ∎
Lemma 7.
The naive smooth histogram approach outputs a solution of size .
Proof.
Recall that all intervals in and are inserted only into good regions or at the boundary of good regions. Let be a good region (i.e for integer ). We will show that (the good region and its neighbouring regions).
After processing the stream, we have region boundaries at , and . Observe that all of the intervals of cross the region boundaries at , and , so they do not get saved by the run of . Furthermore, we have that the interval crosses the region boundary at while the interval crosses the boundary at , so these intervals also do not get saved.
When processing , however, the intervals get inserted into the solution. Additionally, they do not change the structure of the regions created by the run (i.e they only modify the leftmost or the rightmost interval of each region created by ).
So, . Because there are good regions and the intervals where do not pairwise intersect, we have as required.
∎
Lemma 8.
Steps and of Algorithm 4 output a solution of size .
Proof.
First, observe that steps 1 and 2 of Algorithm 4 are run on substream . Furthermore, since only good regions contain intervals in substream , it suffices to explore how steps 1 and 2 act on good regions.
In each good region, the stream is responsible for creating the regions of the runs of Algorithm 4. Observe that the intervals and cross the boundaries of these regions and are thus not stored by the algorithm. In each good region, only the intervals of get memorized. Therefore, we obtain a solution of size for each good region. Overall, the obtained solution by the runs of steps 1 and 2 of Algorithm 4 is of size .
∎
Using the last two lemmas, we obtain the following conclusion:
Theorem 8.
Let be the size of the solution output by Algorithm 4 on the described input. Then,.
Proof.
By the previous two lemmas, both and steps 1 and 2 of Algorithm 4 output a solution of size . Notice that the set of saved intervals of steps 1 and 2 of Algorithm 4 is a subset of the saved intervals of , therefore we cannot improve the overall solution by combining both solutions. So, .
By Lemma 6, . So, we get the required conclusion. ∎