-
Storage Management with Multi-Version Partitioned B-Trees
Authors:
Christian Riegger,
Ilia Petrov
Abstract:
Database Management Systems and K/V-Stores operate on updatable datasets -- massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ Trees allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics.…
▽ More
Database Management Systems and K/V-Stores operate on updatable datasets -- massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ Trees allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees overcome this issue by horizontal partitioning fractions of data - small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.
Firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like K/V-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B$^+$-Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, yielding efficient write patterns and reducing write amplification.
We integrated MV-PBT in the WiredTiger KV storage engine. MV-PBT offers an up to 2x increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B+ Trees in a YCSB workload.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
bloomRF: On Performing Range-Queries in Bloom-Filters with Piecewise-Monotone Hash Functions and Prefix Hashing
Authors:
Bernhard Mößner,
Christian Riegger,
Arthur Bernhardt,
Ilia Petrov
Abstract:
We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewise-monotone hash-functions that preserve local order and support fast range-lookups wi…
▽ More
We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewise-monotone hash-functions that preserve local order and support fast range-lookups with fewer memory accesses. bloomRF has near-optimal space complexity and constant query complexity. Although, bloomRF is designed for integer domains, it supports floating-points, and can serve as a multi-attribute filter. The evaluation in RocksDB and in a standalone library shows that it is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings and distributions, while kee** the false-positive rate low.
△ Less
Submitted 22 July, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
bloomRF: On Performing Range-Queries with Bloom-Filters based on Piecewise-Monotone Hash Functions and Dyadic Trace-Trees
Authors:
Christian Riegger,
Arthur Bernhardt,
Bernhard Moessner,
Ilia Petrov
Abstract:
We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries on a single data structure. bloomRF extends Bloom-Filters with range query support and may replace them. The core idea is to employ a dyadic interval scheme to determine the set of dyadic intervals covering a data point, which are then encoded and inserted. bloomRF introduces Dya…
▽ More
We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries on a single data structure. bloomRF extends Bloom-Filters with range query support and may replace them. The core idea is to employ a dyadic interval scheme to determine the set of dyadic intervals covering a data point, which are then encoded and inserted. bloomRF introduces Dyadic Trace-Trees as novel data structure that represents those covering intervals implicitly. A Trace-Tree encoding scheme represents the set of covering intervals efficiently, in a compact bit representation. Furthermore, bloomRF introduces novel piecewise-monotone hash functions that are locally order-preserving and thus support range querying. We present an efficient membership computation method for range-queries. Although, bloomRF is designed for integers it also supports string and floating-point data types. It can also handle multiple attributes and serve as multi-attribute filter.
We evaluate bloomRF in RocksDB and in a standalone library. bloomRF is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
MV-PBT: Multi-Version Index for Large Datasets and HTAP Workloads
Authors:
Christian Riegger,
Tobias Vincon,
Robert Gottstein,
Ilia Petrov
Abstract:
Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause fr…
▽ More
Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple-versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table -- not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets. In this paper we propose the Multi-Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads (CH-Benchmark). MV-PBT offers 40% higher transactional throughput compared to WiredTiger's LSM-Tree implementation under YCSB.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.