Skip to main content

Showing 1–14 of 14 results for author: Maltzahn, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.11459  [pdf, ps, other

    cs.DC

    A Moveable Beast: Partitioning Data and Compute for Computational Storage

    Authors: Aldrin Montana, Yuanqing Xue, Jeff LeFevre, Carlos Maltzahn, Josh Stuart, Philip Kufeldt, Peter Alvaro

    Abstract: Over the years, hardware trends have introduced various heterogeneous compute units while also bringing network and storage bandwidths within an order of magnitude of memory subsystems. In response, developers have used increasingly exotic solutions to extract more performance from hardware; typically relying on static, design-time partitioning of their programs which cannot keep pace with storage… ▽ More

    Submitted 18 January, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 14 pages, 7 figures, submitted to SIGMOD 2023 updated acknowledgements

    ACM Class: H.2.4; H.2.6; J.3

  2. arXiv:2211.05118  [pdf, other

    cs.SE cs.MS

    Map** Out the HPC Dependency Chaos

    Authors: Farid Zakaria, Thomas R. W. Scogland, Todd Gamblin, Carlos Maltzahn

    Abstract: High Performance Computing~(HPC) software stacks have become complex, with the dependencies of some applications numbering in the hundreds. Packaging, distributing, and administering software stacks of that scale is a complex undertaking anywhere. HPC systems deal with esoteric compilers, hardware, and a panoply of uncommon combinations. In this paper, we explore the mechanisms available for packa… ▽ More

    Submitted 10 November, 2022; v1 submitted 22 October, 2022; originally announced November 2022.

    Comments: Presented at SuperComputing 2022 (https://sc22.supercomputing.org/program/)

  3. arXiv:2210.06827  [pdf, other

    cs.DC cs.NI cs.PF

    Processing Particle Data Flows with SmartNICs

    Authors: Jianshen Liu, Carlos Maltzahn, Matthew L. Curry, Craig Ulmer

    Abstract: Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about th… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: This is an expansion of the paper with the same name published in HPEC'22

    MSC Class: 65Y20; 60L10; 90B18 ACM Class: B.4.1; B.8.2; C.4; D.2.13; E.2

    Journal ref: 2022 IEEE High Performance Extreme Computing Virtual Conference (HPEC'22)

  4. arXiv:2209.08868  [pdf, other

    physics.comp-ph cs.DC hep-ex hep-lat hep-th

    Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access

    Authors: W. Bhimji, D. Carder, E. Dart, J. Duarte, I. Fisk, R. Gardner, C. Guok, B. Jayatilaka, T. Lehman, M. Lin, C. Maltzahn, S. McKee, M. S. Neubauer, O. Rind, O. Shadura, N. V. Tran, P. van Gemmeren, G. Watts, B. A. Weaver, F. Würthwein

    Abstract: Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer… ▽ More

    Submitted 29 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Snowmass 2021 Computational Frontier CompF4 topical group report. v2: Expanded introduction. Updated author list. 52 pages, 6 figures

  5. arXiv:2206.02906  [pdf, other

    cs.DC

    Managing Bufferbloat in Cloud Storage Systems

    Authors: Seyed Esmaeil Mirvakili, Samuel Just, Carlos Maltzahn

    Abstract: Today, companies and data centers are moving towards cloud and serverless storage systems instead of traditional file systems. As a result of such a transition, allocating sufficient resources to users and parties to satisfy their service level demands has become crucial in cloud storage. In cloud storage, the schedulability of system components and requests is of great importance to achieving QoS… ▽ More

    Submitted 3 April, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

  6. arXiv:2204.06074  [pdf, other

    cs.DC

    Skyhook: Towards an Arrow-Native Storage System

    Authors: Jayjeet Chakraborty, Ivo Jimenez, Sebastiaan Alvarez Rodriguez, Alexandru Uta, Jeff LeFevre, Carlos Maltzahn

    Abstract: With the ever-increasing dataset sizes, several file formats such as Parquet, ORC, and Avro have been developed to store data efficiently, save the network, and interconnect bandwidth at the price of additional CPU utilization. However, with the advent of networks supporting 25-100 Gb/s and storage devices delivering 1, 000, 000 reqs/sec, the CPU has become the bottleneck trying to keep up feeding… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.09894

  7. arXiv:2106.13020  [pdf, other

    cs.DC

    Zero-Cost, Arrow-Enabled Data Interface for Apache Spark

    Authors: Sebastiaan Alvarez Rodriguez, Jayjeet Chakraborty, Aaron Chu, Ivo Jimenez, Jeff LeFevre, Carlos Maltzahn, Alexandru Uta

    Abstract: Distributed data processing ecosystems are widespread and their components are highly specialized, such that efficient interoperability is urgent. Recently, Apache Arrow was chosen by the community to serve as a format mediator, providing efficient in-memory data representation. Arrow enables efficient data movement between data processing and storage engines, significantly improving interoperabil… ▽ More

    Submitted 27 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: 6 pages, 6 figures

  8. arXiv:2105.09894  [pdf, other

    cs.DC

    Towards an Arrow-native Storage System

    Authors: Jayjeet Chakraborty, Ivo Jimenez, Sebastiaan Alvarez Rodriguez, Alexandru Uta, Jeff LeFevre, Carlos Maltzahn

    Abstract: With the ever-increasing dataset sizes, several file formats like Parquet, ORC, and Avro have been developed to store data efficiently and to save network and interconnect bandwidth at the price of additional CPU utilization. However, with the advent of networks supporting 25-100 Gb/s and storage devices delivering 1, 000, 000 reqs/sec the CPU has become the bottleneck, trying to keep up feeding d… ▽ More

    Submitted 21 May, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 7 pages, 6 figures, workshop

  9. arXiv:2105.06619  [pdf, other

    cs.NI cs.PF

    Performance Characteristics of the BlueField-2 SmartNIC

    Authors: Jianshen Liu, Carlos Maltzahn, Craig Ulmer, Matthew Leon Curry

    Abstract: High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: 13 pages, 8 figures, 4 tables

    ACM Class: B.8.2; C.2.0

  10. arXiv:2012.01144  [pdf, ps, other

    cs.CY cs.SE

    The CROSS Incubator: A Case Study for funding and training RSEs

    Authors: Stephanie Lieggi, Ivo Jimenez, Jeff LeFevre, Carlos Maltzahn

    Abstract: The incubator and research projects sponsored by the Center for Research in Open Source Software (CROSS, cross.ucsc.edu) at UC Santa Cruz have been very effective at promoting the professional and technical development of research software engineers. Carlos Maltzahn founded CROSS in 2015 with a generous gift of $2,000,000 from UC Santa Cruz alumnus Dr. Sage Weil and founding memberships of Toshiba… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: Presented at RSE-HPC 2020: Research Software Engineers in HPC - Creating Community, Building Careers, Addressing Challenges, co-located with SC20, Virtual, November 12, 2020

  11. arXiv:2007.01789  [pdf, other

    cs.DC

    Map** Datasets to Object Storage System

    Authors: Xiaowei, Chu, Jeff LeFevre, Aldrin Montana, Dana Robinson, Quincey Koziol, Peter Alvaro, Carlos Maltzahn

    Abstract: Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

    Journal ref: In 24th International Conference on Computing in High Energy & Nuclear Physics, Adelaide, Australia, November 4-8 2019

  12. arXiv:1912.09256  [pdf, other

    cs.PF cs.DC

    Is Big Data Performance Reproducible in Modern Cloud Networks?

    Authors: Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan Rellermeyer, Carlos Maltzahn, Robert Ricci, Alexandru Iosup

    Abstract: Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mains… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: 12 pages paper, 3 pages references

  13. arXiv:1909.04550  [pdf, other

    cs.DC cs.DB cs.PF

    MBWU: Benefit Quantification for Data Access Function Offloading

    Authors: Jianshen Liu, Philip Kufeldt, Carlos Maltzahn

    Abstract: The storage industry is considering new kinds of storage devices that support data access function offloading, i.e. the ability to perform data access functions on the storage device itself as opposed to performing it on a separate compute system to which the storage device is connected. But what is the benefit of offloading to a storage device that is controlled by an embedded platform, very diff… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: 16 pages, 11 figures

    Journal ref: HPC I/O in the Data Center Workshop, 2019

  14. arXiv:1406.3699  [pdf, ps, other

    cs.DC

    Distributed Versioned Object Storage -- Alternatives at the OSD layer (Poster Extended Abstract)

    Authors: Ivo Jimenez, Carlos Maltzahn, Jay Lofstead

    Abstract: The ability to store multiple versions of a data item is a powerful primitive that has had a wide variety of uses: relational databases, transactional memory, version control systems, to name a few. However, each implementation uses a very particular form of versioning that is customized to the domain in question and hidden away from the user. In our going project, we are reviewing and analyzing m… ▽ More

    Submitted 14 June, 2014; originally announced June 2014.

    Comments: 2 pages, 2 tables, poster extended abstract, HPDC '14, The ACM International Symposium on High-Performance Parallel and Distributed Computing