-
LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD
Authors:
Wenlong Wang,
David Hung-Chang Du
Abstract:
In this paper, we introduce LearnedKV, a novel tiered key-value (KV) store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index. This integration yields superior read and write performance compared to standalone indexing structures on SSDs. Our design capitalizes on the LSM tree's high write/update throughput and the Learned Index's fast read capabilities, enabling eac…
▽ More
In this paper, we introduce LearnedKV, a novel tiered key-value (KV) store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index. This integration yields superior read and write performance compared to standalone indexing structures on SSDs. Our design capitalizes on the LSM tree's high write/update throughput and the Learned Index's fast read capabilities, enabling each component to leverage its strengths. We analyze the impact of size on LSM tree performance and demonstrate how the tiered Learned Index significantly mitigates the LSM tree's size-related performance degradation, particularly by reducing the intensive I/O operations resulting from re-insertions after Garbage Collection (GC). To maintain rapid read performance for newly inserted keys, we introduce a non-blocking conversion mechanism that efficiently transforms the existing LSM tree into a new Learned Index with minimal overhead during GC. Our experimental results, conducted across diverse workloads, show that LearnedKV outperforms state-of-the-art solutions by up to 1.32x in read requests and 1.31x in write performance.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
DNA Storage: A Promising Large Scale Archival Storage?
Authors:
Yixun Wei,
Bingzhe Li,
David H. C. Du
Abstract:
Deoxyribonucleic Acid (DNA), with its high density and long durability, is a promising storage medium for long-term archival storage and has attracted much attention. Several studies have verified the feasibility of using DNA for archival storage with a small amount of data. However, the achievable storage capacity of DNA as archival storage has not been comprehensively investigated yet. Theoretic…
▽ More
Deoxyribonucleic Acid (DNA), with its high density and long durability, is a promising storage medium for long-term archival storage and has attracted much attention. Several studies have verified the feasibility of using DNA for archival storage with a small amount of data. However, the achievable storage capacity of DNA as archival storage has not been comprehensively investigated yet. Theoretically, the DNA storage density is about 1 exabyte/mm3 (109 GB/mm3). However, according to our investigation, DNA storage tube capacity based on the current synthesizing and sequencing technologies is only at hundreds of Gigabytes due to the limitation of multiple bio and technology constraints. This paper identifies and investigates the critical factors affecting the single DNA tube capacity for archival storage. Finally, we suggest several promising directions to overcome the limitations and enhance DNA storage capacity.
△ Less
Submitted 13 September, 2022; v1 submitted 4 April, 2022;
originally announced April 2022.
-
TurboKV: Scaling Up The Performance of Distributed Key-Value Stores With In-Switch Coordination
Authors:
Hebatalla Eldakiky,
David Hung-Chang Du,
Eman Ramadan
Abstract:
The power and flexibility of software-defined networks lead to a programmable network infrastructure in which in-network computation can help accelerating the performance of applications. This can be achieved by offloading some computational tasks to the network. However, what kind of computational tasks should be delegated to the network to accelerate applications performance? In this paper, we p…
▽ More
The power and flexibility of software-defined networks lead to a programmable network infrastructure in which in-network computation can help accelerating the performance of applications. This can be achieved by offloading some computational tasks to the network. However, what kind of computational tasks should be delegated to the network to accelerate applications performance? In this paper, we propose a way to exploit the usage of programmable switches to scale up the performance of distributed key-value stores. Moreover, as a proof-of-concept, we propose TurboKV, an efficient distributed key-value store architecture that utilizes programmable switches as: 1) partition management nodes to store the key-value store partitions and replicas information; and 2) monitoring stations to measure the load of storage nodes, this monitoring information is used to balance the load among storage nodes. We also propose a key-based routing protocol to route the search queries of clients based on the requested keys to targeted storage nodes. Our experimental results of an initial prototype show that our proposed architecture improves the throughput and reduces the latency of distributed key-value stores when compared to the existing architectures.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.