-
More Bang For Your Buck(et): Fast and Space-efficient Hardware-accelerated Coarse-granular Indexing on GPUs
Authors:
Justus Henneberg,
Felix Schuhknecht,
Rosina Kharal,
Trevor Brown
Abstract:
In recent work, we have shown that NVIDIA's raytracing cores on RTX video cards can be exploited to realize hardware-accelerated lookups for GPU-resident database indexes. On a high level, the concept materializes all keys as triangles in a 3D scene and indexes them. Lookups are performed by firing rays into the scene and utilizing the index structure to detect hits in a hardware-accelerated fashi…
▽ More
In recent work, we have shown that NVIDIA's raytracing cores on RTX video cards can be exploited to realize hardware-accelerated lookups for GPU-resident database indexes. On a high level, the concept materializes all keys as triangles in a 3D scene and indexes them. Lookups are performed by firing rays into the scene and utilizing the index structure to detect hits in a hardware-accelerated fashion. While this approach called RTIndeX (or short RX) is indeed promising, it currently suffers from three limitations: (1) significant memory overhead per key, (2) slow range-lookups, and (3) poor updateability. In this work, we show that all three problems can be tackled by a single design change: Generalizing RX to become a coarse-granular index cgRX. Instead of indexing individual keys, cgRX indexes buckets of keys which are post-filtered after retrieval. This drastically reduces the memory overhead, leads to the generation of a smaller and more efficient index structure, and enables fast range-lookups as well as updates. We will see that representing the buckets in the 3D space such that the lookup of a key is performed both correctly and efficiently requires the careful orchestration of firing rays in a specific sequence. Our experimental evaluation shows that cgRX offers the most bang for the buck(et) by providing a throughput in relation to the memory footprint that is 1.5-3x higher than for the comparable range-lookup supporting baselines. At the same time, cgRX improves the range-lookup performance over RX by up to 2x and offers practical updateability that is up to 5.5x faster than rebuilding from scratch.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Taking the Shortcut: Actively Incorporating the Virtual Memory Index of the OS to Hardware-Accelerate Database Indexing
Authors:
Felix Schuhknecht
Abstract:
Index structures often materialize one or multiple levels of explicit indirections (aka pointers) to allow for a quick traversal to the data of interest. Unfortunately, dereferencing a pointer to go from one level to the other is costly since additionally to following the address, it involves two address translations from virtual memory to physical memory under the hood. In the worst case, such an…
▽ More
Index structures often materialize one or multiple levels of explicit indirections (aka pointers) to allow for a quick traversal to the data of interest. Unfortunately, dereferencing a pointer to go from one level to the other is costly since additionally to following the address, it involves two address translations from virtual memory to physical memory under the hood. In the worst case, such an address translation is resolved by an index access itself, namely by a lookup into the page table, a central hardware-accelerated index structure of the OS. However, if the page table is anyways constantly queried, it raises the question whether we can actively incorporate it into our database indexes and make it work for us. Precisely, instead of materializing indirections in form of pointers, we propose to express these indirections directly in the page table wherever possible. By introducing such shortcuts, we (a) effectively reduce the height of traversal during lookups and (b) exploit the hardware-acceleration of lookups in the page table. In this work, we analyze the strengths and considerations of this approach and showcase its effectiveness at the case of the real-world indexing scheme extendible hashing.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing
Authors:
Justus Henneberg,
Felix Schuhknecht
Abstract:
Data management on GPUs has become increasingly relevant due to a tremendous rise in processing power and available GPU memory. Similar to main-memory systems, there is a need for performant GPU-resident index structures to speed up query processing. Unfortunately, map** indexes efficiently to the highly parallel and hard-to-program hardware is challenging and often fails to yield the desired pe…
▽ More
Data management on GPUs has become increasingly relevant due to a tremendous rise in processing power and available GPU memory. Similar to main-memory systems, there is a need for performant GPU-resident index structures to speed up query processing. Unfortunately, map** indexes efficiently to the highly parallel and hard-to-program hardware is challenging and often fails to yield the desired performance and flexibility. Instead of proposing yet another hand-tailored index, we investigate whether we can exploit an indexing mechanism that is already built into modern GPUs: The raytracing hardware accelerator provided by NVIDIA RTX GPUs. To do so, we re-phrase the database indexing problem as a raytracing problem, where we express the dataset to be indexed as objects in a 3D scene, and point/range lookups as rays across the scene. In this combination, coined RX in the following, lookups are performed as intersection tests in hardware by dedicated raytracing cores. To analyze the pros, cons, and usefulness of the raytracing pipeline for database indexing, we carefully evaluate RX along fourteen dimensions and demonstrate its competitiveness and potential in a large variety of situations.
△ Less
Submitted 27 September, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
The Easiest Way of Turning your Relational Database into a Blockchain -- and the Cost of Doing So
Authors:
Felix Schuhknecht,
Simon Jörz
Abstract:
Blockchain systems essentially consist of two levels: The network level has the responsibility of distributing an ordered stream of transactions to all nodes of the network in exactly the same way, even in the presence of a certain amount of malicious parties (byzantine fault tolerance). On the node level, each node then receives this ordered stream of transactions and executes it within some sort…
▽ More
Blockchain systems essentially consist of two levels: The network level has the responsibility of distributing an ordered stream of transactions to all nodes of the network in exactly the same way, even in the presence of a certain amount of malicious parties (byzantine fault tolerance). On the node level, each node then receives this ordered stream of transactions and executes it within some sort of transaction processing system, typically to alter some kind of state. This clear separation into two levels as well as drastically different application requirements have led to the materialization of the network level in form of so-called blockchain frameworks. While providing all the "blockchain features", these frameworks leave the node level backend flexible or even left to be implemented depending on the specific needs of the application.
In the following paper, we present how to integrate a highly versatile transaction processing system, namely a relational DBMS, into such a blockchain framework. As framework, we use the popular Tendermint Core, now part of the Ignite/Cosmos eco-system, which can run both public and permissioned networks and combine it with relational DBMSs as the backend. This results in a "relational blockchain", which is able to run deterministic SQL on a fully replicated relational database. Apart from presenting the integration and its pitfalls, we will carefully evaluate the performance implications of such combinations, in particular, the throughput and latency overhead caused by the blockchain layer on top of the DBMS. As a result, we give recommendations on how to run such a systems combination efficiently in practice.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Towards Adaptive Storage Views in Virtual Memory
Authors:
Felix Schuhknecht,
Justus Henneberg
Abstract:
Traditionally, DBMSs separate their storage layer from their indexing layer. While the storage layer physically materializes the database and provides low-level access methods to it, the indexing layer on top enables a faster locating of searched-for entries. While this clearly separates concerns, it also adds a level of indirection to the already complex execution path. In this work, we propose a…
▽ More
Traditionally, DBMSs separate their storage layer from their indexing layer. While the storage layer physically materializes the database and provides low-level access methods to it, the indexing layer on top enables a faster locating of searched-for entries. While this clearly separates concerns, it also adds a level of indirection to the already complex execution path. In this work, we propose an alternative design: Instead of conservatively separating both layers, we naturally fuse them by integrating an adaptive coarse-granular indexing scheme directly into the storage layer. We do so by utilizing tools of the virtual memory management subsystem provided by the OS: On the lowest level, we materialize the database content in form of physical main memory. On top of that, we allow the creation of arbitrarily many virtual memory storage views that map to subsets of the database having certain properties of interest. This creation happens fully adaptively as a side-product of query processing. To speed up query answering, we route each query automatically to the most fitting virtual view(s). By this, we naturally index the storage layer in its core and gradually improve the provided scan performance.
△ Less
Submitted 6 December, 2022; v1 submitted 4 September, 2022;
originally announced September 2022.
-
Northlight: Declarative and Optimized Analysis of Atmospheric Datasets in SparkSQL
Authors:
Justus Henneberg,
Felix Schuhknecht,
Philipp Reutter,
Nils Brast,
Peter Spichtinger
Abstract:
Performing data-intensive analytics is an essential part of modern Earth science. As such, research in atmospheric physics and meteorology frequently requires the processing of very large observational and/or modeled datasets. Typically, these datasets (a) have high dimensionality, i.e. contain various measurements per spatiotemporal point, (b) are extremely large, containing observations over a l…
▽ More
Performing data-intensive analytics is an essential part of modern Earth science. As such, research in atmospheric physics and meteorology frequently requires the processing of very large observational and/or modeled datasets. Typically, these datasets (a) have high dimensionality, i.e. contain various measurements per spatiotemporal point, (b) are extremely large, containing observations over a long time period. Additionally, (c) the analytical tasks being performed on these datasets are structurally complex. Over the years, the binary format NetCDF has been established as a de-facto standard in distributing and exchanging such multi-dimensional datasets in the Earth science community -- along with tools and APIs to visualize, process, and generate them. Unfortunately, these access methods typically lack either (1) an easy-to-use but rich query interface or (2) an automatic optimization pipeline tailored towards the specialities of these datasets. As such, researchers from the field of Earth sciences (which are typically not computer scientists) unnecessarily struggle in efficiently working with these datasets on a daily basis. Consequently, in this work, we aim at resolving the aforementioned issues. Instead of proposing yet another specialized tool and interface to work with atmospheric datasets, we integrate sophisticated NetCDF processing capabilities into the established SparkSQL dataflow engine -- resulting in our system Northlight. In contrast to comparable systems, Northlight introduces a set of fully automatic optimizations specifically tailored towards NetCDF processing. We experimentally show that Northlight scales gracefully with the selectivity of the analysis tasks and outperforms the comparable state-of-the-art pipeline by up to a factor of 6x.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
ChainifyDB: How to Blockchainify any Data Management System
Authors:
Felix Martin Schuhknecht,
Ankur Sharma,
Jens Dittrich,
Divya Agrawal
Abstract:
Today's permissioned blockchain systems come in a stand-alone fashion and require the users to integrate yet another full-fledged transaction processing system into their already complex data management landscape. This seems odd as blockchains and traditional DBMSs share large parts of their processing stack. Thus, rather than replacing the established data systems altogether, we advocate to simpl…
▽ More
Today's permissioned blockchain systems come in a stand-alone fashion and require the users to integrate yet another full-fledged transaction processing system into their already complex data management landscape. This seems odd as blockchains and traditional DBMSs share large parts of their processing stack. Thus, rather than replacing the established data systems altogether, we advocate to simply 'chainify' them with a blockchain layer on top.
Unfortunately, this task is far more challenging than it sounds: As we want to build upon heterogeneous transaction processing systems, which potentially behave differently, we cannot rely on every organization to execute every transaction deterministically in the same way. Further, as these systems are already filled with data and being used by top-level applications, we also cannot rely on every organization being resilient against tampering with its local data.
Therefore, in this work, we will drop these assumptions and introduce a powerful processing model that avoids them in the first place: The so-called Whatever-LedgerConsensus (WLC) model allows us to create a highly flexible permissioned blockchain layer coined ChainifyDB that (a) is centered around bullet-proof database technology, (b) makes even stronger guarantees than existing permissioned systems, (c) provides a sophisticated recovery mechanism, (d) has an up to 6x higher throughput than the permissioned blockchain system Fabric, and (e) can easily be integrated into an existing heterogeneous database landscape.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
How to Databasify a Blockchain: the Case of Hyperledger Fabric
Authors:
Ankur Sharma,
Felix Martin Schuhknecht,
Divya Agrawal,
Jens Dittrich
Abstract:
Within the last few years, a countless number of blockchain systems have emerged on the market, each one claiming to revolutionize the way of distributed transaction processing in one way or the other. Many blockchain features, such as byzantine fault tolerance (BFT), are indeed valuable additions in modern environments. However, despite all the hype around the technology, many of the challenges t…
▽ More
Within the last few years, a countless number of blockchain systems have emerged on the market, each one claiming to revolutionize the way of distributed transaction processing in one way or the other. Many blockchain features, such as byzantine fault tolerance (BFT), are indeed valuable additions in modern environments. However, despite all the hype around the technology, many of the challenges that blockchain systems have to face are fundamental transaction management problems. These are largely shared with traditional database systems, which have been around for decades already.
These similarities become especially visible for systems, that blur the lines between blockchain systems and classical database systems. A great example of this is Hyperledger Fabric, an open-source permissioned blockchain system under development by IBM. By having a relaxed view on BFT, the transaction pipeline of Fabric highly resembles the workflow of classical distributed databases systems.
This raises two questions: (1) Which conceptual similarities and differences do actually exist between a system such as Fabric and a classical distributed database system? (2) Is it possible to improve on the performance of Fabric by transitioning technology from the database world to blockchains and thus blurring the lines between these two types of systems even further? To tackle these questions, we first explore Fabric from the perspective of database research, where we observe weaknesses in the transaction pipeline. We then solve these issues by transitioning well-understood database concepts to Fabric, namely transaction reordering as well as early transaction abort. Our experimental evaluation shows that our improved version Fabric++ significantly increases the throughput of successful transactions over the vanilla version by up to a factor of 3x.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
The Case for Automatic Database Administration using Deep Reinforcement Learning
Authors:
Ankur Sharma,
Felix Martin Schuhknecht,
Jens Dittrich
Abstract:
Like any large software system, a full-fledged DBMS offers an overwhelming amount of configuration knobs. These range from static initialisation parameters like buffer sizes, degree of concurrency, or level of replication to complex runtime decisions like creating a secondary index on a particular column or reorganising the physical layout of the store. To simplify the configuration, industry grad…
▽ More
Like any large software system, a full-fledged DBMS offers an overwhelming amount of configuration knobs. These range from static initialisation parameters like buffer sizes, degree of concurrency, or level of replication to complex runtime decisions like creating a secondary index on a particular column or reorganising the physical layout of the store. To simplify the configuration, industry grade DBMSs are usually shipped with various advisory tools, that provide recommendations for given workloads and machines. However, reality shows that the actual configuration, tuning, and maintenance is usually still done by a human administrator, relying on intuition and experience. Recent work on deep reinforcement learning has shown very promising results in solving problems, that require such a sense of intuition. For instance, it has been applied very successfully in learning how to play complicated games with enormous search spaces. Motivated by these achievements, in this work we explore how deep reinforcement learning can be used to administer a DBMS. First, we will describe how deep reinforcement learning can be used to automatically tune an arbitrary software system like a DBMS by defining a problem environment. Second, we showcase our concept of NoDBA at the concrete example of index selection and evaluate how well it recommends indexes for given workloads.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting
Authors:
Ankur Sharma,
Felix Martin Schuhknecht,
Jens Dittrich
Abstract:
Efficient transactional management is a delicate task. As systems face transactions of inherently different types, ranging from point updates to long running analytical computations, it is hard to satisfy their individual requirements with a single processing component. Unfortunately, most systems nowadays rely on such a single component that implements its parallelism using multi-version concurre…
▽ More
Efficient transactional management is a delicate task. As systems face transactions of inherently different types, ranging from point updates to long running analytical computations, it is hard to satisfy their individual requirements with a single processing component. Unfortunately, most systems nowadays rely on such a single component that implements its parallelism using multi-version concurrency control (MVCC). While MVCC parallelizes short-running OLTP transactions very well, it struggles in the presence of mixed workloads containing long-running scan-centric OLAP queries, as scans have to work their way through large amounts of versioned data. To overcome this problem, we propose a system, which reintroduces the concept of heterogeneous transaction processing: OLAP transactions are outsourced to run on separate (virtual) snapshots while OLTP transactions run on the most recent representation of the database. Inside both components, MVCC ensures a high degree of concurrency. The biggest challenge of such a heterogeneous approach is to generate the snapshots at a high frequency. Previous approaches heavily suffered from the tremendous cost of snapshot creation. In our system, we overcome the restrictions of the OS by introducing a custom system call vm_snapshot, that is hand-tailored to our precise needs: it allows fine-granular snapshot creation at very high frequencies, rendering the snapshot creation phase orders of magnitudes faster than state-of-the-art approaches. Our experimental evaluation on a heterogeneous workload based on TPC-H transactions and handcrafted OLTP transactions shows that our system enables significantly higher analytical transaction throughputs on mixed workloads than homogeneous approaches. In this sense, we introduce a system that accelerates Analytical processing by introducing custom Kernel functionalities: AnKerDB.
△ Less
Submitted 13 September, 2017;
originally announced September 2017.
-
Main Memory Adaptive Indexing for Multi-core Systems
Authors:
Victor Alvarez,
Felix Martin Schuhknecht,
Jens Dittrich,
Stefan Richter
Abstract:
Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing; as opposed to traditional full index creation where the indexing effort is performed up front before answering any queries. Adaptive indexing has received a considerable amount of attention, and several algorithms have been proposed over the past few years; including a recent experimental…
▽ More
Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing; as opposed to traditional full index creation where the indexing effort is performed up front before answering any queries. Adaptive indexing has received a considerable amount of attention, and several algorithms have been proposed over the past few years; including a recent experimental study comparing a large number of existing methods. Until now, however, most adaptive indexing algorithms have been designed single-threaded, yet with multi-core systems already well established, the idea of designing parallel algorithms for adaptive indexing is very natural. In this regard only one parallel algorithm for adaptive indexing has recently appeared in the literature: The parallel version of standard cracking. In this paper we describe three alternative parallel algorithms for adaptive indexing, including a second variant of a parallel standard cracking algorithm. Additionally, we describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on sorting. We then thoroughly compare all these algorithms experimentally; along a variant of a recently published parallel version of radix sort. Parallel sorting algorithms serve as a realistic baseline for multi-threaded adaptive indexing techniques. In total we experimentally compare seven parallel algorithms. Additionally, we extensively profile all considered algorithms. The initial set of experiments considered in this paper indicates that our parallel algorithms significantly improve over previously known ones. Our results suggest that, although adaptive indexing algorithms are a good design choice in single-threaded environments, the rules change considerably in the parallel case. That is, in future highly-parallel environments, sorting algorithms could be serious alternatives to adaptive indexing.
△ Less
Submitted 8 April, 2014;
originally announced April 2014.