Skip to main content

Showing 1–10 of 10 results for author: Guok, C

Searching in archive cs. Search in all archives.
.
  1. Effectiveness and predictability of in-network storage cache for scientific workflows

    Authors: Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas

    Abstract: Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access lat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  2. Analyzing Transatlantic Network Traffic over Scientific Data Caches

    Authors: Z. Deng, A. Sim, K. Wu, C. Guok, D. Hazen, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel

    Abstract: Large scientific collaborations often share huge volumes of data around the world. Consequently a significant amount of network bandwidth is needed for data replication and data access. Users in the same region may possibly share resources as well as data, especially when they are working on related topics with similar datasets. In this work, we study the network traffic patterns and resource util… ▽ More

    Submitted 17 July, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  3. Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations

    Authors: Frank Würthwein, Jonathan Guiang, Aashay Arora, Diego Davila, John Graham, Dima Mishin, Thomas Hutton, Igor Sfiligoi, Harvey Newman, Justas Balcas, Tom Lehman, Xi Yang, Chin Guok

    Abstract: Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities.… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Submitted to the proceedings of the XLOOP workshop held in conjunction with Supercomputing 22

  4. arXiv:2209.08868  [pdf, other

    physics.comp-ph cs.DC hep-ex hep-lat hep-th

    Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access

    Authors: W. Bhimji, D. Carder, E. Dart, J. Duarte, I. Fisk, R. Gardner, C. Guok, B. Jayatilaka, T. Lehman, M. Lin, C. Maltzahn, S. McKee, M. S. Neubauer, O. Rind, O. Shadura, N. V. Tran, P. van Gemmeren, G. Watts, B. A. Weaver, F. Würthwein

    Abstract: Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer… ▽ More

    Submitted 29 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Snowmass 2021 Computational Frontier CompF4 topical group report. v2: Expanded introduction. Updated author list. 52 pages, 6 figures

  5. arXiv:2205.05598  [pdf, other

    cs.DC cs.NI eess.SY

    Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches

    Authors: Julian Bellavita, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila

    Abstract: The XRootD system is used to transfer, store, and cache large datasets from high-energy physics (HEP). In this study we focus on its capability as distributed on-demand storage cache. Through exploring a large set of daily log files between 2020 and 2021, we seek to understand the data access patterns that might inform future cache design. Our study begins with a set of summary statistics regardin… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  6. arXiv:2205.05563  [pdf, other

    cs.NI cs.DC cs.LG cs.PF

    Access Trends of In-network Cache for Scientific Data

    Authors: Ruize Han, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, Justas Balcas, Harvey Newman

    Abstract: Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects.… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  7. arXiv:2203.08280  [pdf

    cs.NI

    Data Transfer and Network Services management for Domain Science Workflows

    Authors: Tom Lehman, Xi Yang, Chin Guok, Frank Wuerthwein, Igor Sfiligoi, John Graham, Aashay Arora, Dima Mishin, Diego Davila, Jonathan Guiang, Tom Hutton, Harvey Newman, Justas Balcas

    Abstract: This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transfers and associated network resource coordination i… ▽ More

    Submitted 20 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: contribution to Snowmass 2022

  8. arXiv:2203.06843  [pdf, other

    cs.NI

    Deploying in-network caches in support of distributed scientific data sharing

    Authors: Alex Sim, Ezra Kissel, Chin Guok

    Abstract: The importance of intelligent data placement, management, and analysis has become apparent as scientific data volumes across the network continue to increase. To that end, we describe the use of in-network caching service deployments as a means to improve application performance and preserve available network bandwidth in a high energy physics data distribution environment. Details of the software… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Comments: contribution to Snowmass 2021

  9. Analyzing scientific data sharing patterns for in-network data caching

    Authors: Elizabeth Copps, Huiyi Zhang, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, Edgar Fajardo

    Abstract: The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant portion of the popular dataset is transferred multiple times to different users as well as to the same user for various reasons. In-network data caching for the s… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  10. Software-Defined Network for End-to-end Networked Science at the Exascale

    Authors: Inder Monga, Chin Guok, John MacAuley, Alex Sim, Harvey Newman, Justas Balcas, Phil DeMar, Linda Winkler, Tom Lehman, Xi Yang

    Abstract: Domain science applications and workflow processes are currently forced to view the network as an opaque infrastructure into which they inject data and hope that it emerges at the destination with an acceptable Quality of Experience. There is little ability for applications to interact with the network to exchange information, negotiate performance parameters, discover expected performance metrics… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: To appear in the journal of Future Generation Computer Systems

    Report number: FUTURE5588