Search | arXiv e-print repository

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

Authors: Russell Sears, Catharine Van Ingen, Jim Gray

Abstract: Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation - one of the operational issues that can affect the performance and/or manageability of the system as deployed long term… ▽ More Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation - one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256KB are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256KB and 1MB, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of "storage age" or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates. △ Less

Submitted 25 January, 2007; originally announced January 2007.

Report number: MSR-TR-2006-45

Journal ref: CIDR 2007

arXiv:cs/0701166 [pdf]

Empirical Measurements of Disk Failure Rates and Error Rates

Authors: Jim Gray, Catharine van Ingen

Abstract: The SATA advertised bit error rate of one error in 10 terabytes is frightening. We moved 2 PB through low-cost hardware and saw five disk read error events, several controller failures, and many system reboots caused by security patches. We conclude that SATA uncorrectable read errors are not yet a dominant system-fault source - they happen, but are rare compared to other problems. We also concl… ▽ More The SATA advertised bit error rate of one error in 10 terabytes is frightening. We moved 2 PB through low-cost hardware and saw five disk read error events, several controller failures, and many system reboots caused by security patches. We conclude that SATA uncorrectable read errors are not yet a dominant system-fault source - they happen, but are rare compared to other problems. We also conclude that UER (uncorrectable error rate) is not the relevant metric for our needs. When an uncorrectable read error happens, there are typically several damaged storage blocks (and many uncorrectable read errors.) Also, some uncorrectable read errors may be masked by the operating system. The more meaningful metric for data architects is Mean Time To Data Loss (MTTDL.) △ Less

Submitted 25 January, 2007; originally announced January 2007.

Report number: MSR-TR-2005-166

arXiv:cs/0612111 [pdf]

Fragmentation in Large Object Repositories

Authors: Russell Sears, Catharine van Ingen

Abstract: Fragmentation leads to unpredictable and degraded application performance. While these problems have been studied in detail for desktop filesystem workloads, this study examines newer systems such as scalable object stores and multimedia repositories. Such systems use a get/put interface to store objects. In principle, databases and filesystems can support such applications efficiently, allowing… ▽ More Fragmentation leads to unpredictable and degraded application performance. While these problems have been studied in detail for desktop filesystem workloads, this study examines newer systems such as scalable object stores and multimedia repositories. Such systems use a get/put interface to store objects. In principle, databases and filesystems can support such applications efficiently, allowing system designers to focus on complexity, deployment cost and manageability. Although theoretical work proves that certain storage policies behave optimally for some workloads, these policies often behave poorly in practice. Most storage benchmarks focus on short-term behavior or do not measure fragmentation. We compare SQL Server to NTFS and find that fragmentation dominates performance when object sizes exceed 256KB-1MB. NTFS handles fragmentation better than SQL Server. Although the performance curves will vary with other systems and workloads, we expect the same interactions between fragmentation and free space to apply. It is well-known that fragmentation is related to the percentage free space. We found that the ratio of free space to object size also impacts performance. Surprisingly, in both systems, storing objects of a single size causes fragmentation, and changing the size of write requests affects fragmentation. These problems could be addressed with simple changes to the filesystem and database interfaces. It is our hope that an improved understanding of fragmentation will lead to predictable storage systems that require less maintenance after deployment. △ Less

Submitted 21 December, 2006; originally announced December 2006.

Comments: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, USA

Showing 1–3 of 3 results for author: Van Ingen, C