Search | arXiv e-print repository

Storage Management with Multi-Version Partitioned B-Trees

Abstract: Database Management Systems and K/V-Stores operate on updatable datasets -- massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ Trees allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics.… ▽ More Database Management Systems and K/V-Stores operate on updatable datasets -- massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ Trees allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees overcome this issue by horizontal partitioning fractions of data - small enough to fully reside in main memory, but require frequent maintenance to sustain search performance. Firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like K/V-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B$^+$-Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, yielding efficient write patterns and reducing write amplification. We integrated MV-PBT in the WiredTiger KV storage engine. MV-PBT offers an up to 2x increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B+ Trees in a YCSB workload. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Extended Version, ADBIS 2022

arXiv:2207.04789 [pdf, other]

bloomRF: On Performing Range-Queries in Bloom-Filters with Piecewise-Monotone Hash Functions and Prefix Hashing

Authors: Bernhard Mößner, Christian Riegger, Arthur Bernhardt, Ilia Petrov

Abstract: We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewise-monotone hash-functions that preserve local order and support fast range-lookups wi… ▽ More We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewise-monotone hash-functions that preserve local order and support fast range-lookups with fewer memory accesses. bloomRF has near-optimal space complexity and constant query complexity. Although, bloomRF is designed for integer domains, it supports floating-points, and can serve as a multi-attribute filter. The evaluation in RocksDB and in a standalone library shows that it is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings and distributions, while kee** the false-positive rate low. △ Less

Submitted 22 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: Extended version. Original accepted at EDBT 2023

arXiv:2012.15596 [pdf, other]

bloomRF: On Performing Range-Queries with Bloom-Filters based on Piecewise-Monotone Hash Functions and Dyadic Trace-Trees

Authors: Christian Riegger, Arthur Bernhardt, Bernhard Moessner, Ilia Petrov

Abstract: We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries on a single data structure. bloomRF extends Bloom-Filters with range query support and may replace them. The core idea is to employ a dyadic interval scheme to determine the set of dyadic intervals covering a data point, which are then encoded and inserted. bloomRF introduces Dya… ▽ More We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries on a single data structure. bloomRF extends Bloom-Filters with range query support and may replace them. The core idea is to employ a dyadic interval scheme to determine the set of dyadic intervals covering a data point, which are then encoded and inserted. bloomRF introduces Dyadic Trace-Trees as novel data structure that represents those covering intervals implicitly. A Trace-Tree encoding scheme represents the set of covering intervals efficiently, in a compact bit representation. Furthermore, bloomRF introduces novel piecewise-monotone hash functions that are locally order-preserving and thus support range querying. We present an efficient membership computation method for range-queries. Although, bloomRF is designed for integers it also supports string and floating-point data types. It can also handle multiple attributes and serve as multi-attribute filter. We evaluate bloomRF in RocksDB and in a standalone library. bloomRF is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings. △ Less

Submitted 31 December, 2020; originally announced December 2020.

arXiv:1910.08023 [pdf, other]

MV-PBT: Multi-Version Index for Large Datasets and HTAP Workloads

Authors: Christian Riegger, Tobias Vincon, Robert Gottstein, Ilia Petrov

Abstract: Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause fr… ▽ More Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple-versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table -- not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets. In this paper we propose the Multi-Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads (CH-Benchmark). MV-PBT offers 40% higher transactional throughput compared to WiredTiger's LSM-Tree implementation under YCSB. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Showing 1–4 of 4 results for author: Riegger, C