Skip to main content

Showing 1–15 of 15 results for author: McCauley, S

.
  1. arXiv:2407.02468  [pdf, ps, other

    cs.DS

    Improved Space-Efficient Approximate Nearest Neighbor Search Using Function Inversion

    Authors: Samuel McCauley

    Abstract: Approximate nearest neighbor search (ANN) data structures have widespread applications in machine learning, computational biology, and text processing. The goal of ANN is to preprocess a set S so that, given a query q, we can find a point y whose distance from q approximates the smallest distance from q to any point in S. For most distance functions, the best-known ANN bounds for high-dimensional… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. SPIDER: Improved Succinct Rank and Select Performance

    Authors: Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, Zach S. Sturdevant

    Abstract: Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: rank(i) gives the number of 1 bits in slots 0 through i, and select(j) gives the first slot s with rank(s) = j. A succinct data structure can answer these queries while using space much smaller than the size of the original bit vector. State of the art succinct rank and select data structures… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2404.17544  [pdf, ps, other

    cs.DS

    Root-to-Leaf Scheduling in Write-Optimized Trees

    Authors: Christopher Chung, William Jannen, Samuel McCauley, Bertrand Simon

    Abstract: Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thu… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2402.11028  [pdf, other

    cs.DS

    Incremental Topological Ordering and Cycle Detection with Predictions

    Authors: Samuel McCauley, Benjamin Moseley, Aidin Niaparast, Shikha Singh

    Abstract: This paper leverages the framework of algorithms-with-predictions to design data structures for two fundamental dynamic graph problems: incremental topological ordering and cycle detection. In these problems, the input is a directed graph on $n$ nodes, and the $m$ edges arrive one by one. The data structure must maintain a topological ordering of the vertices at all times and detect if the newly i… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  5. arXiv:2305.10536  [pdf, other

    cs.DS cs.LG

    Online List Labeling with Predictions

    Authors: Samuel McCauley, Benjamin Moseley, Aidin Niaparast, Shikha Singh

    Abstract: A growing line of work shows how learned predictions can be used to break through worst-case barriers to improve the running time of an algorithm. However, incorporating predictions into data structures with strong theoretical guarantees remains underdeveloped. This paper takes a step in this direction by showing that predictions can be leveraged in the fundamental online list labeling problem. In… ▽ More

    Submitted 20 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2107.02866  [pdf, other

    cs.DS

    Telesco** Filter: A Practical Adaptive Filter

    Authors: David J. Lee, Samuel McCauley, Shikha Singh, Max Stein

    Abstract: Filters are fast, small and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, a query x not in S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positve probability of epsilon. Correctness is traded for space: filters only use log (1/ε) + O(1) bits p… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  7. arXiv:2105.10622  [pdf, other

    cs.DS

    Support Optimality and Adaptive Cuckoo Filters

    Authors: Tsvi Kopelowitz, Samuel McCauley, Ely Porat

    Abstract: Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

  8. arXiv:1907.01600  [pdf, ps, other

    cs.DS

    Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

    Authors: Samuel McCauley

    Abstract: Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess $n$ strings of length $d$, to quickly answer queries $q$ of the form: if there is a database string within edit distance $r$ of $q$, return a database string within edit distance $cr$ of $q$. Previous approaches to this… ▽ More

    Submitted 8 July, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

  9. arXiv:1807.01389  [pdf, ps, other

    cs.GT

    Efficient Rational Proofs with Strong Utility-Gap Guarantees

    Authors: **g Chen, Samuel McCauley, Shikha Singh

    Abstract: As modern computing moves towards smaller devices and powerful cloud platforms, more and more computation is being delegated to powerful service providers. Interactive proofs are a widely-used model to design efficient protocols for verifiable computation delegation. Rational proofs are payment-based interactive proofs. The payments are designed to incentivize the provers to give correct answers.… ▽ More

    Submitted 12 September, 2018; v1 submitted 3 July, 2018; originally announced July 2018.

  10. arXiv:1804.05615  [pdf, ps, other

    cs.DS

    Adaptive MapReduce Similarity Joins

    Authors: Samuel McCauley, Francesco Silvestri

    Abstract: Similarity joins are a fundamental database operation. Given data sets S and R, the goal of a similarity join is to find all points x in S and y in R with distance at most r. Recent research has investigated how locality-sensitive hashing (LSH) can be used for similarity join, and in particular two recent lines of work have made exciting progress on LSH-based join performance. Hu, Tao, and Yi (POD… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

  11. arXiv:1804.03054  [pdf, other

    cs.DS

    Set Similarity Search for Skewed Data

    Authors: Samuel McCauley, Jesper W. Mikkelsen, Rasmus Pagh

    Abstract: Set similarity join, as well as the corresponding indexing problem set similarity search, are fundamental primitives for managing noisy or uncertain data. For example, these primitives can be used in data cleaning to identify different representations of the same object. In many cases one can represent an object as a sparse 0-1 vector, or equivalently as the set of nonzero entries in such a vector… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

  12. arXiv:1711.01616  [pdf, ps, other

    cs.DS

    Bloom Filters, Adaptivity, and the Dictionary Problem

    Authors: Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, Shikha Singh

    Abstract: The Bloom filter---or, more generally, an approximate membership query data structure (AMQ)---maintains a compact, probabilistic representation of a set S of keys from a universe U. An AMQ supports lookups, inserts, and (for some AMQs) deletes. A query for an x in S is guaranteed to return "present." A query for x not in S returns "absent" with probability at least 1-epsilon, where epsilon is a tu… ▽ More

    Submitted 26 August, 2018; v1 submitted 5 November, 2017; originally announced November 2017.

  13. Non-Cooperative Rational Interactive Proofs

    Authors: **g Chen, Samuel McCauley, Shikha Singh

    Abstract: Interactive-proof games model the scenario where an honest party interacts with powerful but strategic provers, to elicit from them the correct answer to a computational question. Interactive proofs are increasingly used as a framework to design protocols for computation outsourcing. Existing interactive-proof games largely fall into two categories: either as games of cooperation such as multi-p… ▽ More

    Submitted 11 August, 2021; v1 submitted 1 August, 2017; originally announced August 2017.

    Journal ref: In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019); Article No. 29; pp. 29:1-29:16

  14. arXiv:1504.08361  [pdf, other

    cs.CC

    Rational Proofs with Multiple Provers

    Authors: **g Chen, Samuel McCauley, Shikha Singh

    Abstract: Interactive proofs (IP) model a world where a verifier delegates computation to an untrustworthy prover, verifying the prover's claims before accepting them. IP protocols have applications in areas such as verifiable computation outsourcing, computation delegation, cloud computing. In these applications, the verifier may pay the prover based on the quality of his work. Rational interactive proofs… ▽ More

    Submitted 11 November, 2017; v1 submitted 30 April, 2015; originally announced April 2015.

    Comments: Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science. ACM, 2016

  15. arXiv:1504.06501  [pdf, other

    cs.DS

    Run Generation Revisited: What Goes Up May or May Not Come Down

    Authors: Michael A. Bender, Samuel McCauley, Andrew McGregor, Shikha Singh, Hoa T. Vu

    Abstract: In this paper, we revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible. We develop algorithms for minimizing the total number of runs (or equivalently, maxim… ▽ More

    Submitted 24 April, 2015; originally announced April 2015.