Skip to main content

Showing 1–12 of 12 results for author: Pavlo, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.05028  [pdf, other

    cs.DB

    An Empirical Evaluation of Columnar Storage Formats

    Authors: Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, Huanchen Zhang

    Abstract: Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both… ▽ More

    Submitted 7 November, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: 15 pages; typos corrected, missing figure legend added

  2. arXiv:2010.06760  [pdf, other

    cs.DB

    Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version)

    Authors: Yu Xia, Xiangyao Yu, Andrew Pavlo, Srinivas Devadas

    Abstract: Existing single-stream logging schemes are unsuitable for in-memory database management systems (DBMSs) as the single log is often a performance bottleneck. To overcome this problem, we present Taurus, an efficient parallel logging scheme that uses multiple log streams, and is compatible with both data and command logging. Taurus tracks and encodes transaction dependencies using a vector of log se… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  3. arXiv:2004.14471  [pdf, other

    cs.DB

    Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

    Authors: Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, Andrew Pavlo

    Abstract: The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) s… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 16 pages

  4. arXiv:2004.09619  [pdf, other

    cs.OS

    Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM

    Authors: Rajat Kateja, Andy Pavlo, Gregory R. Ganger

    Abstract: Vilamb provides efficient asynchronous systemredundancy for direct access (DAX) non-volatile memory (NVM) storage. Production storage deployments often use system-redundancy in form of page checksums and cross-page parity. State-of-the-art solutions for maintaining system-redundancy for DAX NVM either incur a high performance overhead or require specialized hardware. The Vilamb user-space library… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Report number: CMU-PDL-20-101

  5. arXiv:2003.02391  [pdf, other

    cs.DB

    Order-Preserving Key Compression for In-Memory Search Trees

    Authors: Huanchen Zhang, Xiaoxuan Liu, David G. Andersen, Michael Kaminsky, Kimberly Keeton, Andrew Pavlo

    Abstract: We present the High-speed Order-Preserving Encoder (HOPE) for in-memory search trees. HOPE is a fast dictionary-based compressor that encodes arbitrary keys while preserving their order. HOPE's approach is to identify common key patterns at a fine granularity and exploit the entropy to achieve high compression rates with a small dictionary. We first develop a theoretical model to reason about orde… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: SIGMOD'20 version + Appendix

  6. arXiv:1903.02990  [pdf, other

    cs.DB

    Scheduling OLTP Transactions via Machine Learning

    Authors: Yangjun Sheng, Anthony Tomasic, Tieying Zhang, Andrew Pavlo

    Abstract: Current main memory database system architectures are still challenged by high contention workloads and this challenge will continue to grow as the number of cores in processors continues to increase. These systems schedule transactions randomly across cores to maximize concurrency and to produce a uniform load across cores. Scheduling never considers potential conflicts. Performance could be impr… ▽ More

    Submitted 29 May, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  7. arXiv:1901.10938  [pdf, other

    cs.DB

    Multi-Tier Buffer Management and Storage System Design for Non-Volatile Memory

    Authors: Joy Arulraj, Andy Pavlo, Krishna Teja Malladi

    Abstract: The design of the buffer manager in database management systems (DBMSs) is influenced by the performance characteristics of volatile memory (DRAM) and non-volatile storage (e.g., SSD). The key design assumptions have been that the data must be migrated to DRAM for the DBMS to operate on it and that storage is orders of magnitude slower than DRAM. But the arrival of new non-volatile memory (NVM) te… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: 16 pages

  8. arXiv:1901.07064  [pdf, other

    cs.DB

    Predictive Indexing

    Authors: Joy Arulraj, Ran Xian, Lin Ma, Andrew Pavlo

    Abstract: There has been considerable research on automated index tuning in database management systems (DBMSs). But the majority of these solutions tune the index configuration by retrospectively making computationally expensive physical design changes all at once. Such changes degrade the DBMS's performance during the process, and have reduced utility during subsequent query processing due to the delay be… ▽ More

    Submitted 21 January, 2019; originally announced January 2019.

    Comments: 12 pages

    ACM Class: H.2.2; H.2.4

  9. arXiv:1503.01143  [pdf, other

    cs.DB

    S-Store: Streaming Meets Transaction Processing

    Authors: John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, Michael Stonebraker, Kristin Tufte, Hao Wang

    Abstract: Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and… ▽ More

    Submitted 10 March, 2015; v1 submitted 3 March, 2015; originally announced March 2015.

  10. arXiv:1110.6647  [pdf, other

    cs.DB

    On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems

    Authors: Andrew Pavlo, Evan P. C. Jones, Stanley Zdonik

    Abstract: A new emerging class of parallel database management systems (DBMS) is designed to take advantage of the partitionable workloads of on-line transaction processing (OLTP) applications. Transactions in these systems are optimized to execute to completion on a single node in a shared-nothing cluster without needing to coordinate with other nodes or use expensive concurrency control measures. But some… ▽ More

    Submitted 30 October, 2011; originally announced October 2011.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 2, pp. 85-96 (2011)

  11. arXiv:1101.0350  [pdf, ps, other

    cs.NI

    Graffiti Networks: A Subversive, Internet-Scale File Sharing Model

    Authors: Andrew Pavlo, Ning Shi

    Abstract: The proliferation of peer-to-peer (P2P) file sharing protocols is due to their efficient and scalable methods for data dissemination to numerous users. But many of these networks have no provisions to provide users with long term access to files after the initial interest has diminished, nor are they able to guarantee protection for users from malicious clients that wish to implicate them in incri… ▽ More

    Submitted 1 January, 2011; originally announced January 2011.

  12. arXiv:cs/0606007  [pdf, ps, other

    cs.HC cs.CG cs.GR

    A parent-centered radial layout algorithm for interactive graph visualization and animation

    Authors: Andrew Pavlo, Christopher Homan, Jonathan Schull

    Abstract: We have developed (1) a graph visualization system that allows users to explore graphs by viewing them as a succession of spanning trees selected interactively, (2) a radial graph layout algorithm, and (3) an animation algorithm that generates meaningful visualizations and smooth transitions between graphs while minimizing edge crossings during transitions and in static layouts. Our system is… ▽ More

    Submitted 1 June, 2006; originally announced June 2006.

    ACM Class: I.3.3; H.5.0