Skip to main content

Showing 1–3 of 3 results for author: Staylor, M

.
  1. arXiv:2403.15721  [pdf, other

    cs.DC

    Design and Implementation of an Analysis Pipeline for Heterogeneous Data

    Authors: Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey Fox

    Abstract: Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. In… ▽ More

    Submitted 7 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: 14 pages, 16 figures, 2 tables

    ACM Class: H.2.4; D.2.7; D.2.2

  2. arXiv:2307.01394  [pdf, ps, other

    cs.DC cs.AI cs.IR cs.LG

    In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

    Authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox

    Abstract: The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amoun… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Report number: FGCS-D-23-00577R1

  3. arXiv:2301.07896  [pdf, other

    cs.DC cs.DB

    Supercharging Distributed Computing Environments For High Performance Data Engineering

    Authors: Niranda Perera, Kaiying Shan, Supun Kamburugamuwe, Thejaka Amila Kanewela, Chathura Widanage, Arup Sarker, Mills Staylor, Tianle Zhong, Vibhatha Abeykoon, Geoffrey Fox

    Abstract: The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to process terabytes of data. They can easily exceed the capabilities of a single machine, but also demand significant developer time & effort. Therefore it is esse… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.