Skip to main content

Showing 1–1 of 1 results for author: Ghit, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2112.06280  [pdf, other

    cs.DC

    In-Memory Indexed Caching for Distributed Data Processing

    Authors: Alexandru Uta, Bogdan Ghit, Ankur Dave, Jan Rellermeyer, Peter Boncz

    Abstract: Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache th… ▽ More

    Submitted 8 February, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at IEEE IPDPS 2022