The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

Hayot-Sasson, Valerie; Glatard, Tristan; Rokem, Ariel

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2108.10496 (cs)

[Submitted on 24 Aug 2021]

Title:The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

Authors:Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem

View PDF

Abstract:To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs within processing time of prior tasks. We present an implementation of "Rolling Prefetch", a Python library that implements a particular form of prefetching from AWS S3 object store, and we quantify its benefits.
Rolling Prefetch extends S3Fs, a Python library exposing AWS S3 functionality via a file object, to add prefetch capabilities. In measured analysis performance of a 500 GB brain connectivity dataset stored on S3, we found that prefetching provides significant speed-ups of up to 1.86x, even in applications consisting entirely of data loading. The observed speed-up values are consistent with our theoretical analysis. Our results demonstrate the usefulness of prefetching for scientific data processing on cloud infrastructures and provide an implementation applicable to various application domains.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2108.10496 [cs.DC]
	(or arXiv:2108.10496v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2108.10496

Submission history

From: Valérie Hayot-Sasson [view email]
[v1] Tue, 24 Aug 2021 02:52:15 UTC (160 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2021-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Valérie Hayot-Sasson
Tristan Glatard
Ariel Rokem

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators