Skip to main content

Showing 1–2 of 2 results for author: McKinney, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.05028  [pdf, other

    cs.DB

    An Empirical Evaluation of Columnar Storage Formats

    Authors: Xinyu Zeng, Yulong Hui, Jiahong Shen, Andrew Pavlo, Wes McKinney, Huanchen Zhang

    Abstract: Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both… ▽ More

    Submitted 7 November, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: 15 pages; typos corrected, missing figure legend added

  2. arXiv:2004.14471  [pdf, other

    cs.DB

    Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

    Authors: Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, Andrew Pavlo

    Abstract: The proliferation of modern data processing tools has given rise to open-source columnar data formats. The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application. These formats, however, are read-only, and organizations must use a heavy-weight transformation process to load data from on-line transactional processing (OLTP) s… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 16 pages