Search | arXiv e-print repository

Adaptive Quotient Filters

Authors: Richard Wen, Hunter McCoy, David Tench, Guido Tagliavini, Michael A. Bender, Alex Conway, Martin Farach-Colton, Rob Johnson, Prashant Pandey

Abstract: Adaptive filters, such as telesco** and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have seen no… ▽ More Adaptive filters, such as telesco** and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have seen no adoption in real-world systems due to two main reasons. Firstly, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish the overall system performance compared to a traditional filter. In this paper, we design and implement AdaptiveQF, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The AdaptiveQF is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the AdaptiveQF practical in real systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2203.14927 [pdf, other]

GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Authors: David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz Chowdhury, J. Ahmed Dellas, Martin Farach-Colton, Tyler Seip, Kenny Zhang

Abstract: Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a l… ▽ More Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketches) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 15 pages, 16 figures, SIGMOD 2022

arXiv:2102.08476 [pdf, other]

Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Authors: Andrew McGregor, David Tench, Hoa T. Vu

Abstract: We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are $m$ subsets of a universe of size $n$ and a value $k\in [m]$. In Max-Cover, the problem is to find a collection of at most $k$ sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at mos… ▽ More We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are $m$ subsets of a universe of size $n$ and a value $k\in [m]$. In Max-Cover, the problem is to find a collection of at most $k$ sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most $k$ sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most $d$, there exist single-pass algorithms using $\tilde{O}(d^{d+1} k^d)$ space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant $d$. If each element appears in at most $r$ sets, we present single pass algorithms using $\tilde{O}(k^2 r/ε^3)$ space that return a $1+ε$ approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e., $\tilde{O}(k^3 r/ε^{4})$ space, that $1+ε$ approximates Max-Unique-Cover. In contrast to the above results, when $d$ and $r$ are arbitrary, any constant pass $1+ε$ approximation algorithm for either problem requires $Ω(ε^{-2}m)$ space but a single pass $O(ε^{-2}mk)$ space algorithm exists. In fact any constant-pass algorithm with an approximation better than $e/(e-1)$ and $e^{1-1/k}$ for Max-Cover and Max-Unique-Cover respectively requires $Ω(m/k^2)$ space when $d$ and $r$ are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: Conference version to appear at ICDT 2021

arXiv:1902.04738 [pdf, other]

doi 10.1145/3314221.3314582

Mesh: Compacting Memory Management for C/C++ Applications

Authors: Bobby Powers, David Tench, Emery D. Berger, Andrew McGregor

Abstract: Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual… ▽ More Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, breaking the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the-art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Firefox by 16% and Redis by 39%. △ Less

Submitted 16 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: Draft version, accepted at PLDI 2019

arXiv:1506.04417 [pdf, ps, other]

Densest Subgraph in Dynamic Graph Streams

Authors: Andrew McGregor, David Tench, Sofya Vorotnikova, Hoa T. Vu

Abstract: In this paper, we consider the problem of approximating the densest subgraph in the dynamic graph stream model. In this model of computation, the input graph is defined by an arbitrary sequence of edge insertions and deletions and the goal is to analyze properties of the resulting graph given memory that is sub-linear in the size of the stream. We present a single-pass algorithm that returns a… ▽ More In this paper, we consider the problem of approximating the densest subgraph in the dynamic graph stream model. In this model of computation, the input graph is defined by an arbitrary sequence of edge insertions and deletions and the goal is to analyze properties of the resulting graph given memory that is sub-linear in the size of the stream. We present a single-pass algorithm that returns a $(1+ε)$ approximation of the maximum density with high probability; the algorithm uses $O(ε^{-2} n \polylog n)$ space, processes each stream update in $\polylog (n)$ time, and uses $\poly(n)$ post-processing time where $n$ is the number of nodes. The space used by our algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a poly-logarithmic factor for constant $ε$. The best existing results for this problem were established recently by Bhattacharya et al.~(STOC 2015). They presented a $(2+ε)$ approximation algorithm using similar space and another algorithm that both processed each update and maintained a $(4+ε)$ approximation of the current maximum density in $\polylog (n)$ time per-update. △ Less

Submitted 14 June, 2015; originally announced June 2015.

Comments: To appear in MFCS 2015

Showing 1–5 of 5 results for author: Tench, D