Skip to main content

Showing 1–20 of 20 results for author: Walzer, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18497  [pdf, other

    cs.DS

    PHOBIC: Perfect Hashing with Optimized Bucket Sizes and Interleaved Coding

    Authors: Stefan Hermann, Hans-Peter Lehmann, Giulio Ermanno Pibiri, Peter Sanders, Stefan Walzer

    Abstract: A minimal perfect hash function (MPHF) maps a set of n keys to {1, ..., n} without collisions. Such functions find widespread application e.g. in bioinformatics and databases. In this paper we revisit PTHash - a construction technique particularly designed for fast queries. PTHash distributes the input keys into small buckets and, for each bucket, it searches for a hash function seed that places i… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2404.09607  [pdf, other

    cs.DS

    Better space-time-robustness trade-offs for set reconciliation

    Authors: Djamal Belazzougui, Gregory Kucherov, Stefan Walzer

    Abstract: We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 19 pages

  3. arXiv:2403.00736  [pdf, other

    math.PR cs.DS

    The Probability to Hit Every Bin with a Linear Number of Balls

    Authors: Stefan Walzer

    Abstract: Assume that $2n$ balls are thrown independently and uniformly at random into $n$ bins. We consider the unlikely event $E$ that every bin receives at least one ball, showing that $\Pr[E] = Θ(b^n)$ where $b \approx 0.836$. Note that, due to correlations, $b$ is not simply the probability that any single bin receives at least one ball. More generally, we consider the event that throwing $αn$ balls in… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  4. arXiv:2310.14959  [pdf, other

    cs.DS

    ShockHash: Near Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

    Authors: Hans-Peter Lehmann, Peter Sanders, Stefan Walzer

    Abstract: A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of n*log(e)=1.44n bits needed to represent an MPHF. This can be reached by a brute-force algorithm that tries e^n hash function seeds in expectation and stores the first seed leading to an MPHF. The most space-efficient previous algorithms for constructing MPHFs all use… ▽ More

    Submitted 5 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Expands arXiv:2310.14959v1 (Bipartite ShockHash) and contains content from arXiv:2308.09561v2 (Plain ShockHash). An earlier title of this article was "Bipartite ShockHash: Pruning ShockHash Search for Efficient Perfect Hashing". Now the article is a preprint of a journal version combining ShockHash and bipartite ShockHash

  5. arXiv:2308.09561  [pdf, other

    cs.DS

    ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

    Authors: Hans-Peter Lehmann, Peter Sanders, Stefan Walzer

    Abstract: A minimal perfect hash function (MPHF) maps a set $S$ of $n$ keys to the first $n$ integers without collisions. There is a lower bound of $n\log_2e-O(\log n)$ bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation, $e^n\textrm{poly}(n)$… ▽ More

    Submitted 13 November, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

  6. arXiv:2307.00644  [pdf, other

    cs.DS

    What if we tried Less Power? -- Lessons from studying the power of choices in hashing-based data structures

    Authors: Stefan Walzer

    Abstract: In the first part of this survey, we review how the power of two choices underlies space-efficient data structures like cuckoo hash tables. We'll find that the additional power afforded by more than 2 choices is often outweighed by the additional costs they bring. In the second part, we present a data structure where choices play a role at coarser than per-element granularity. In some sense, we re… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: This article appeared in the algorithms column of the bulletin of the EATCS, Issue Number 140, June 2023

  7. arXiv:2304.09283  [pdf, other

    cs.DS

    Sliding Block Hashing (Slick) -- Basic Algorithmic Ideas

    Authors: Hans-Peter Lehmann, Peter Sanders, Stefan Walzer

    Abstract: We present {\bf Sli}ding Blo{\bf ck} Hashing (Slick), a simple hash table data structure that combines high performance with very good space efficiency. This preliminary report outlines avenues for analysis and implementation that we intend to pursue.

    Submitted 18 April, 2023; originally announced April 2023.

  8. arXiv:2304.07109  [pdf, other

    cs.DS

    Optimal Uncoordinated Unique IDs

    Authors: Peter C. Dillinger, Martín Farach-Colton, Guido Tagliavini, Stefan Walzer

    Abstract: In the Uncoordinated Unique Identifiers Problem (UUIDP) there are $n$ independent instances of an algorithm $\mathcal{A}$ that generates IDs from a universe $\{1, \dots, m\}$, and there is an adversary that requests IDs from these instances. The goal is to design $\mathcal{A}$ such that it minimizes the probability that the same ID is ever generated twice across all instances, that is, minimizes t… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  9. arXiv:2211.03683  [pdf, other

    cs.DS

    Simple Set Sketching

    Authors: Jakob Bæk Tejs Houen, Rasmus Pagh, Stefan Walzer

    Abstract: Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For $αn$ keys and $n$ buckets, where $α$ is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently t… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: To be published at SIAM Symposium on Simplicity in Algorithms (SOSA23)

  10. arXiv:2210.01560  [pdf, other

    cs.DS

    SicHash -- Small Irregular Cuckoo Tables for Perfect Hashing

    Authors: Hans-Peter Lehmann, Peter Sanders, Stefan Walzer

    Abstract: A Perfect Hash Function (PHF) is a hash function that has no collisions on a given input set. PHFs can be used for space efficient storage of data in an array, or for determining a compact representative of each object in the set. In this paper, we present the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. At its core, SicHash uses a known technique: It pla… ▽ More

    Submitted 8 November, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  11. arXiv:2202.05546  [pdf, other

    cs.DS

    Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

    Authors: Stefan Walzer

    Abstract: Most hash tables have an insertion time of $O(1)$, possibly qualified as expected and/or amortised. While insertions into cuckoo hash tables indeed seem to take $O(1)$ expected time in practice, only polylogarithmic guarantees are proven in all but the simplest of practically relevant cases. Given the widespread use of cuckoo hashing to implement compact dictionaries and Bloom filter alternatives,… ▽ More

    Submitted 25 April, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  12. arXiv:2111.06856  [pdf, other

    cs.DS

    Approximate Membership Query Filters with a False Positive Free Set

    Authors: Pedro Reviriego, Alfonso Sánchez-Macián, Stefan Walzer, Peter C. Dillinger

    Abstract: In the last decade, significant efforts have been made to reduce the false positive rate of approximate membership checking structures. This has led to the development of new structures such as cuckoo filters and xor filters. Adaptive filters that can react to false positives as they occur to avoid them for future queries to the same elements have also been recently developed. In this paper, we pr… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  13. arXiv:2109.01892  [pdf, other

    cs.DS

    Fast Succinct Retrieval and Approximate Membership using Ribbon

    Authors: Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, Stefan Walzer

    Abstract: A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theor… ▽ More

    Submitted 5 February, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

  14. arXiv:2103.02515  [pdf, other

    cs.DS cs.DB

    Ribbon filter: practically smaller than Bloom and Xor

    Authors: Peter C. Dillinger, Stefan Walzer

    Abstract: Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound. We introduce the… ▽ More

    Submitted 8 March, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: 14 pages, 7 figures

    ACM Class: E.1; E.2

  15. arXiv:2001.10500  [pdf, other

    cs.DS

    Peeling Close to the Orientability Threshold: Spatial Coupling in Hashing-Based Data Structures

    Authors: Stefan Walzer

    Abstract: In multiple-choice data structures each element $x$ in a set $S$ of $m$ keys is associated with a random set $e(x) \subseteq [n]$ of buckets with capacity $\ell \geq 1$ by hash functions. This setting is captured by the hypergraph $H = ([n],\{e(x) \mid x \in S\})$. Accomodating each key in an associated bucket amounts to finding an $\ell$-orientation of $H$ assigning to each hyperedge an incident… ▽ More

    Submitted 2 November, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: This revision makes substantial changes to the presentation of the material. The introduction was completely rewritten. The variable $n$ was rescaled so that $n$ rather than $n(z+1)$ is the number of vertices. Some discussion was added in the experimental section. The technical content (Sections 2-5) is essentially unchanged

  16. arXiv:1907.04750  [pdf, other

    cs.DS

    Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

    Authors: Martin Dietzfelbinger, Stefan Walzer

    Abstract: In this paper we identify a new class of sparse near-quadratic random Boolean matrices that have full row rank over $\mathbb{F}_2=\{0,1\}$ with high probability and can be transformed into echelon form in almost linear time by a simple version of Gauss elimination. The random matrix with dimensions $n(1-\varepsilon) \times n$ is generated as follows: In each row, identify a block of length… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

  17. arXiv:1907.04749  [pdf, other

    cs.DS

    Dense Peelable Random Uniform Hypergraphs

    Authors: Martin Dietzfelbinger, Stefan Walzer

    Abstract: We describe a new family of $k$-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree $2$, even when the edge density (number of edges over vertices) is close to $1$. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

  18. arXiv:1804.11086  [pdf, other

    cs.DS

    A Subquadratic Algorithm for 3XOR

    Authors: Martin Dietzfelbinger, Philipp Schlag, Stefan Walzer

    Abstract: Given a set $X$ of $n$ binary words of equal length $w$, the 3XOR problem asks for three elements $a, b, c \in X$ such that $a \oplus b=c$, where $ \oplus$ denotes the bitwise XOR operation. The problem can be easily solved on a word RAM with word length $w$ in time $O(n^2 \log{n})$. Using Han's fast integer sorting algorithm (2002/2004) this can be reduced to $O(n^2 \log{\log{n}})$. With randomiz… ▽ More

    Submitted 30 April, 2018; originally announced April 2018.

    ACM Class: F.2.0

  19. arXiv:1707.06855  [pdf, other

    cs.DS

    Load Thresholds for Cuckoo Hashing with Overlap** Blocks

    Authors: Stefan Walzer

    Abstract: Dietzfelbinger and Weidling [DW07] proposed a natural variation of cuckoo hashing where each of $cn$ objects is assigned $k = 2$ intervals of size $\ell$ in a linear (or cyclic) hash table of size $n$ and both start points are chosen independently and uniformly at random. Each object must be placed into a table cell within its intervals, but each cell can only hold one object. Experiments suggeste… ▽ More

    Submitted 17 December, 2019; v1 submitted 21 July, 2017; originally announced July 2017.

  20. arXiv:1412.4100  [pdf, other

    math.CO cs.DM

    Playing weighted Tron on Trees

    Authors: Daniel Hoske, Jonathan Rollin, Torsten Ueckerdt, Stefan Walzer

    Abstract: We consider the weighted version of the Tron game on graphs where two players, Alice and Bob, each build their own path by claiming one vertex at a time, starting with Alice. The vertices carry non-negative weights that sum up to 1 and either player tries to claim a path with larger total weight than the opponent. We show that if the graph is a tree then Alice can always ensure to get at most 1/5… ▽ More

    Submitted 12 December, 2014; originally announced December 2014.

    Comments: 10 pages, 5 figures

    MSC Class: 91A46