-
SquirrelFS: using the Rust compiler to check file-system crash consistency
Authors:
Hayley LeBlanc,
Nathan Taylor,
James Bornholt,
Vijay Chidambaram
Abstract:
This work introduces a new approach to building crash-safe file systems for persistent memory. We exploit the fact that Rust's typestate pattern allows compile-time enforcement of a specific order of operations. We introduce a novel crash-consistency mechanism, Synchronous Soft Updates, that boils down crash safety to enforcing ordering among updates to file-system metadata. We employ this approac…
▽ More
This work introduces a new approach to building crash-safe file systems for persistent memory. We exploit the fact that Rust's typestate pattern allows compile-time enforcement of a specific order of operations. We introduce a novel crash-consistency mechanism, Synchronous Soft Updates, that boils down crash safety to enforcing ordering among updates to file-system metadata. We employ this approach to build SquirrelFS, a new file system with crash-consistency guarantees that are checked at compile time. SquirrelFS avoids the need for separate proofs, instead incorporating correctness guarantees into the typestate itself. Compiling SquirrelFS only takes tens of seconds; successful compilation indicates crash consistency, while an error provides a starting point for fixing the bug. We evaluate SquirrelFS against state of the art file systems such as NOVA and WineFS, and find that SquirrelFS achieves similar or better performance on a wide range of benchmarks and applications.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (Extended Version)
Authors:
Sekwon Lee,
Soujanya Ponnapalli,
Sharad Singhal,
Marcos K. Aguilera,
Kimberly Keeton,
Vijay Chidambaram
Abstract:
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Di…
▽ More
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8x better throughput on various workloads at scale and higher scalability, while providing fast reconfiguration.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems
Authors:
Hayley LeBlanc,
Shankara Pailoor,
Isil Dillig,
James Bornholt,
Vijay Chidambaram
Abstract:
We present a study of crash-consistency bugs in persistent-memory (PM) file systems and analyze their implications for file-system design and testing crash consistency. We develop FlyTrap, a framework to test PM file systems for crash-consistency bugs. FlyTrap discovered 18 new bugs across four PM file systems; the bugs have been confirmed by developers and many have been already fixed. The discov…
▽ More
We present a study of crash-consistency bugs in persistent-memory (PM) file systems and analyze their implications for file-system design and testing crash consistency. We develop FlyTrap, a framework to test PM file systems for crash-consistency bugs. FlyTrap discovered 18 new bugs across four PM file systems; the bugs have been confirmed by developers and many have been already fixed. The discovered bugs have serious consequences such as breaking the atomicity of rename or making the file system unmountable. We present a detailed study of the bugs we found and discuss some important lessons from these observations. For instance, one of our findings is that many of the bugs are due to logic errors, rather than errors in using flushes or fences; this has important applications for future work on testing PM file systems. Another key finding is that many bugs arise from attempts to improve efficiency by performing metadata updates in-place and that recovery code that deals with rebuilding in-DRAM state is a significant source of bugs. These observations have important implications for designing and testing PM file systems. Our code is available at https://github.com/utsaslab/flytrap .
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Authors:
Aashaka Shah,
Vijay Chidambaram,
Meghan Cowan,
Saeed Maleki,
Madan Musuvathi,
Todd Mytkowicz,
Jacob Nelson,
Olli Saarikivi,
Rachee Singh
Abstract:
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm d…
▽ More
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the Nvidia Collective Communication Library (NCCL) by up to 6.7x. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3x for different batch sizes.
△ Less
Submitted 5 October, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters
Authors:
Jayashree Mohan,
Amar Phanishayee,
Janardhan Kulkarni,
Vijay Chidambaram
Abstract:
Training Deep Neural Networks (DNNs) is a widely popular workload in both enterprises and cloud data centers. Existing schedulers for DNN training consider GPU as the dominant resource, and allocate other resources such as CPU and memory proportional to the number of GPUs requested by the job. Unfortunately, these schedulers do not consider the impact of a job's sensitivity to allocation of CPU, m…
▽ More
Training Deep Neural Networks (DNNs) is a widely popular workload in both enterprises and cloud data centers. Existing schedulers for DNN training consider GPU as the dominant resource, and allocate other resources such as CPU and memory proportional to the number of GPUs requested by the job. Unfortunately, these schedulers do not consider the impact of a job's sensitivity to allocation of CPU, memory, and storage resources. In this work, we propose Synergy, a resource-sensitive scheduler for shared GPU clusters. Synergy infers the sensitivity of DNNs to different resources using optimistic profiling; some jobs might benefit from more than the GPU-proportional allocation and some jobs might not be affected by less than GPU-proportional allocation. Synergy performs such multi-resource workload-aware assignments across a set of jobs scheduled on shared multi-tenant clusters using a new near-optimal online algorithm. Our experiments show that workload-aware CPU and memory allocations can improve average JCT up to 3.4x when compared to traditional GPU-proportional scheduling.
△ Less
Submitted 24 August, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
PAIO: A Software-Defined Storage Data Plane Framework
Authors:
Ricardo Macedo,
Yusuke Tanimura,
Jason Haga,
Vijay Chidambaram,
José Pereira,
João Paulo
Abstract:
We propose PAIO, the first general-purpose framework that enables system designers to build custom-made Software-Defined Storage (SDS) data plane stages. It provides the means to implement storage optimizations adaptable to different workflows and user-defined policies, and allows straightforward integration with existing applications and I/O layers. PAIO allows stages to be integrated with modern…
▽ More
We propose PAIO, the first general-purpose framework that enables system designers to build custom-made Software-Defined Storage (SDS) data plane stages. It provides the means to implement storage optimizations adaptable to different workflows and user-defined policies, and allows straightforward integration with existing applications and I/O layers. PAIO allows stages to be integrated with modern SDS control planes to ensure holistic control and system-wide optimal performance. We demonstrate the performance and applicability of PAIO with two use cases. The first improves 99th percentile latency by 4x in industry-standard LSM-based key-value stores. The second ensures dynamic per-application bandwidth guarantees under shared storage environments.
△ Less
Submitted 12 August, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Memory Optimization for Deep Networks
Authors:
Aashaka Shah,
Chao-Yuan Wu,
Jayashree Mohan,
Vijay Chidambaram,
Philipp Krähenbühl
Abstract:
Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic…
▽ More
Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at https://github.com/utsaslab/MONeT.
△ Less
Submitted 2 April, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Towards Software-Defined Data Protection: GDPR Compliance at the Storage Layer is Within Reach
Authors:
Zsolt Istvan,
Soujanya Ponnapalli,
Vijay Chidambaram
Abstract:
Enforcing data protection and privacy rules within large data processing applications is becoming increasingly important, especially in the light of GDPR and similar regulatory frameworks. Most modern data processing happens on top of a distributed storage layer, and securing this layer against accidental or malicious misuse is crucial to ensuring global privacy guarantees. However, the performanc…
▽ More
Enforcing data protection and privacy rules within large data processing applications is becoming increasingly important, especially in the light of GDPR and similar regulatory frameworks. Most modern data processing happens on top of a distributed storage layer, and securing this layer against accidental or malicious misuse is crucial to ensuring global privacy guarantees. However, the performance overhead and the additional complexity for this is often assumed to be significant -- in this work we describe a path forward that tackles both challenges. We propose "Software-Defined Data Protection" (SDP), an adoption of the "Software-Defined Storage" approach to non-performance aspects: a trusted controller translates company and application-specific policies to a set of rules deployed on the storage nodes. These, in turn, apply the rules at line-rate but do not take any decisions on their own. Such an approach decouples often changing policies from request-level enforcement and allows storage nodes to implement the latter more efficiently.
Even though in-storage processing brings challenges, mainly because it can jeopardize line-rate processing, we argue that today's Smart Storage solutions can already implement the required functionality, thanks to the separation of concerns introduced by SDP. We highlight the challenges that remain, especially that of trusting the storage nodes. These need to be tackled before we can reach widespread adoption in cloud environments.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Analyzing and Mitigating Data Stalls in DNN Training
Authors:
Jayashree Mohan,
Amar Phanishayee,
Ashish Raniwala,
Vijay Chidambaram
Abstract:
Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline, i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehen…
▽ More
Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline, i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data preprocessing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5x on a single server).
△ Less
Submitted 19 January, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
GDPR Anti-Patterns: How Design and Operation of Modern Cloud-scale Systems Conflict with GDPR
Authors:
Supreeth Shastri,
Melissa Wasserman,
Vijay Chidambaram
Abstract:
In recent years, our society is being plagued by unprecedented levels of privacy and security breaches. To rein in this trend, the European Union, in 2018, introduced a comprehensive legislation called the General Data Protection Regulation (GDPR). In this article, we review GDPR from a systems perspective, and identify how the design and operation of modern cloud-scale systems conflict with this…
▽ More
In recent years, our society is being plagued by unprecedented levels of privacy and security breaches. To rein in this trend, the European Union, in 2018, introduced a comprehensive legislation called the General Data Protection Regulation (GDPR). In this article, we review GDPR from a systems perspective, and identify how the design and operation of modern cloud-scale systems conflict with this regulation. We illustrate these conflicts via six GDPR anti-patterns: storing data without a clear timeline for deletion; reusing data indiscriminately; creating walled gardens and black markets; risk-agnostic data processing; hiding data breaches; making unexplainable decisions. Our findings reveal deep-rooted tussle between GDPR requirements and how cloud-scale systems that process personal data have evolved in the modern era. While it is imperative to avoid these anti-patterns, we believe that achieving compliance requires comprehensive, grounds up solutions; anything short would amount to fixing a leaky faucet in a sinking ship.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
Understanding and Benchmarking the Impact of GDPR on Database Systems
Authors:
Supreeth Shastri,
Vinay Banakar,
Melissa Wasserman,
Arun Kumar,
Vijay Chidambaram
Abstract:
The General Data Protection Regulation (GDPR) provides new rights and protections to European people concerning their personal data. We analyze GDPR from a systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our analysis reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be…
▽ More
The General Data Protection Regulation (GDPR) provides new rights and protections to European people concerning their personal data. We analyze GDPR from a systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our analysis reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be stored along with the personal data to satisfy the GDPR requirements. Our analysis also helps us identify new workloads that must be supported under GDPR. We design and implement an open-source benchmark called GDPRbench that consists of workloads and metrics needed to understand and assess personal-data processing database systems. To gauge the readiness of modern database systems for GDPR, we follow best practices and developer recommendations to modify Redis, PostgreSQL, and a commercial database system to be GDPR compliant. Our experiments demonstrate that the resulting GDPR compliant systems achieve poor performance on GPDR workloads, and that performance scales poorly as the volume of personal data increases. We discuss the real-world implications of these findings, and identify research challenges towards making GDPR compliance efficient in production environments. We release all of our software artifacts and datasets at http://www.gdprbench.org
△ Less
Submitted 16 March, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
RECIPE : Converting Concurrent DRAM Indexes to Persistent-Memory Indexes
Authors:
Se Kwon Lee,
Jayashree Mohan,
Sanidhya Kashyap,
Taesoo Kim,
Vijay Chidambaram
Abstract:
We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identificatio…
▽ More
We present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be translated with small changes to crash-consistency when the same index is used in PM. We present a set of conditions that enable the identification of this class of DRAM indexes, and the actions to be taken to convert each index to be persistent. Based on these conditions and conversion actions, we modify five different DRAM indexes based on B+ trees, tries, radix trees, and hash tables to their crash-consistent PM counterparts. The effort involved in this conversion is minimal, requiring 30-200 lines of code. We evaluated the converted PM indexes on Intel DC Persistent Memory, and found that they outperform state-of-the-art, hand-crafted PM indexes in multi-threaded workloads by up-to 5.2x. For example, we built P-CLHT, our PM implementation of the CLHT hash table by modifying only 30 LOC. When running YCSB workloads, P-CLHT performs up to 2.4x better than Cacheline-Conscious Extendible Hashing (CCEH), the state-of-the-art PM hash table.
△ Less
Submitted 8 November, 2019; v1 submitted 22 September, 2019;
originally announced September 2019.
-
Rainblock: Faster Transaction Processing in Public Blockchains
Authors:
Soujanya Ponnapalli,
Aashaka Shah,
Amy Tai,
Souvik Banerjee,
Vijay Chidambaram,
Dahlia Malkhi,
Michael Wei
Abstract:
Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without…
▽ More
Public blockchains like Ethereum use Merkle trees to verify transactions received from untrusted servers before applying them to the blockchain. We empirically show that the low throughput of such blockchains is due to the I/O bottleneck associated with using Merkle trees for processing transactions. We present RAINBLOCK, a new architecture for public blockchains that increases throughput without affecting security. RAINBLOCK achieves this by tackling the I/O bottleneck on two fronts: first, decoupling transaction processing from I/O, and removing I/O from the critical path; second, reducing I/O amplification by customizing storage for blockchains. RAINBLOCK uses a novel variant of the Merkle tree, the Distributed Sharded Merkle tree (DSM-TREE) to store system state. We evaluate RAINBLOCK using workloads based on public Ethereum traces (including smart contracts) and show that RAINBLOCK processes 20K transactions per second in a geo-distributed setting with four regions spread across three continents.
△ Less
Submitted 15 October, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
SplitFS: Reducing Software Overhead in File Systems for Persistent Memory
Authors:
Rohan Kadekodi,
Se Kwon Lee,
Sanidhya Kashyap,
Taesoo Kim,
Aasheesh Kolli,
Vijay Chidambaram
Abstract:
We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-map** the underlying…
▽ More
We present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory-map** the underlying file, and serving the read and overwrites using processor loads and stores. Metadata operations are handled by the kernel PM file system (ext4 DAX). SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from, without interfering with each other. SplitFS reduces software overhead by up-to 4x compared to the NOVA PM file system, and 17x compared to ext4-DAX. On a number of micro-benchmarks and applications such as the LevelDB key-value store running the YCSB benchmark, SplitFS increases application performance by up to 2x compared to ext4 DAX and NOVA while providing similar consistency guarantees.
△ Less
Submitted 22 September, 2019;
originally announced September 2019.
-
Analyzing GDPR Compliance Through the Lens of Privacy Policy
Authors:
Jayashree Mohan,
Melissa Wasserman,
Vijay Chidambaram
Abstract:
With the arrival of the European Union's General Data Protection Regulation (GDPR), several companies are making significant changes to their systems to achieve compliance. The changes range from modifying privacy policies to redesigning systems which process personal data. This work analyzes the privacy policies of large-scaled cloud services which seek to be GDPR compliant. The privacy policy is…
▽ More
With the arrival of the European Union's General Data Protection Regulation (GDPR), several companies are making significant changes to their systems to achieve compliance. The changes range from modifying privacy policies to redesigning systems which process personal data. This work analyzes the privacy policies of large-scaled cloud services which seek to be GDPR compliant. The privacy policy is the main medium of information dissemination between the data controller and the users. We show that many services that claim compliance today do not have clear and concise privacy policies. We identify several points in the privacy policies which potentially indicate non-compliance; we term these GDPR vulnerabilities. We identify GDPR vulnerabilities in ten cloud services. Based on our analysis, we propose seven best practices for crafting GDPR privacy policies.
△ Less
Submitted 28 June, 2019;
originally announced June 2019.
-
The Seven Sins of Personal-Data Processing Systems under GDPR
Authors:
Supreeth Shastri,
Melissa Wasserman,
Vijay Chidambaram
Abstract:
In recent years, our society is being plagued by unprecedented levels of privacy and security breaches. To rein in this trend, the European Union, in 2018, introduced a comprehensive legislation called the General Data Protection Regulation (GDPR). In this paper, we review GDPR from a system design perspective, and identify how its regulations conflict with the design, architecture, and operation…
▽ More
In recent years, our society is being plagued by unprecedented levels of privacy and security breaches. To rein in this trend, the European Union, in 2018, introduced a comprehensive legislation called the General Data Protection Regulation (GDPR). In this paper, we review GDPR from a system design perspective, and identify how its regulations conflict with the design, architecture, and operation of modern systems. We illustrate these conflicts via the seven GDPR sins: storing data forever; reusing data indiscriminately; walled gardens and black markets; risk-agnostic data processing; hiding data breaches; making unexplainable decisions; treating security as a secondary goal. Our findings reveal a deep-rooted tussle between GDPR requirements and how modern systems have evolved. We believe that achieving compliance requires comprehensive, grounds up solutions, and anything short would amount to fixing a leaky faucet in a sinking ship.
△ Less
Submitted 15 May, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
Analyzing the Impact of GDPR on Storage Systems
Authors:
Aashaka Shah,
Vinay Banakar,
Supreeth Shastri,
Melissa Wasserman,
Vijay Chidambaram
Abstract:
The recently introduced General Data Protection Regulation (GDPR) is forcing several companies to make significant changes to their systems to achieve compliance. Motivated by the finding that more than 30% of GDPR articles are related to storage, we investigate the impact of GDPR compliance on storage systems. We illustrate the challenges of retrofitting existing systems into compliance by modify…
▽ More
The recently introduced General Data Protection Regulation (GDPR) is forcing several companies to make significant changes to their systems to achieve compliance. Motivated by the finding that more than 30% of GDPR articles are related to storage, we investigate the impact of GDPR compliance on storage systems. We illustrate the challenges of retrofitting existing systems into compliance by modifying Redis to be GDPR-compliant. We show that despite needing to introduce a small set of new features, a strict real-time compliance (eg logging every user request synchronously) lowers Redis' throughput by 20x. Our work reveals how GDPR allows compliance to be a spectrum, and what its implications are for system designers. We discuss the technical challenges that need to be solved before strict compliance can be efficiently achieved.
△ Less
Submitted 16 May, 2019; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing
Authors:
Jayashree Mohan,
Ashlie Martinez,
Soujanya Ponnapalli,
Pandian Raju,
Vijay Chidambaram
Abstract:
We present a new approach to testing file-system crash consistency: bounded black-box crash testing (B3). B3 tests the file system in a black-box manner using workloads of file-system operations. Since the space of possible workloads is infinite, B3 bounds this space based on parameters such as the number of file-system operations or which operations to include, and exhaustively generates workload…
▽ More
We present a new approach to testing file-system crash consistency: bounded black-box crash testing (B3). B3 tests the file system in a black-box manner using workloads of file-system operations. Since the space of possible workloads is infinite, B3 bounds this space based on parameters such as the number of file-system operations or which operations to include, and exhaustively generates workloads within this bounded space. Each workload is tested on the target file system by simulating power-loss crashes while the workload is being executed, and checking if the file system recovers to a correct state after each crash. B3 builds upon insights derived from our study of crash-consistency bugs reported in Linux file systems in the last five years. We observed that most reported bugs can be reproduced using small workloads of three or fewer file-system operations on a newly-created file system, and that all reported bugs result from crashes after fsync() related system calls. We build two tools, CrashMonkey and ACE, to demonstrate the effectiveness of this approach. Our tools are able to find 24 out of the 26 crash-consistency bugs reported in the last five years. Our tools also revealed 10 new crash-consistency bugs in widely-used, mature Linux file systems, seven of which existed in the kernel since 2014. Our tools also found a crash-consistency bug in a verified file system, FSCQ. The new bugs result in severe consequences like broken rename atomicity and loss of persisted files.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
Analyzing IO Amplification in Linux File Systems
Authors:
Jayashree Mohan,
Rohan Kadekodi,
Vijay Chidambaram
Abstract:
We present the first systematic analysis of read, write, and space amplification in Linux file systems. While many researchers are tackling write amplification in key-value stores, IO amplification in file systems has been largely unexplored. We analyze data and metadata operations on five widely-used Linux file systems: ext2, ext4, XFS, btrfs, and F2FS. We find that data operations result in sign…
▽ More
We present the first systematic analysis of read, write, and space amplification in Linux file systems. While many researchers are tackling write amplification in key-value stores, IO amplification in file systems has been largely unexplored. We analyze data and metadata operations on five widely-used Linux file systems: ext2, ext4, XFS, btrfs, and F2FS. We find that data operations result in significant write amplification (2-32X) and that metadata operations have a large IO cost. For example, a single rename requires 648 KB write IO in btrfs. We also find that small random reads result in read amplification of 2-13X. Based on these observations, we present the CReWS conjecture about the relationship between IO amplification, consistency, and storage space utilization. We hope this paper spurs people to design future file systems with less IO amplification, especially for non-volatile memory technologies.
△ Less
Submitted 26 July, 2017;
originally announced July 2017.