Skip to main content

Showing 1–10 of 10 results for author: Widera, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.09731  [pdf

    cs.DC cs.AR

    Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

    Authors: Wael Elwasif, William Godoy, Nick Hagerty, J. Austin Harris, Oscar Hernandez, Balint Joo, Paul Kent, Damien Lebrun-Grandie, Elijah Maccarthy, Veronica G. Melesse Vergara, Bronson Messer, Ross Miller, Sarp Opal, Sergei Bastrakov, Michael Bussmann, Alexander Debus, Klaus Steinger, Jan Stephan, Rene Widera, Spencer H. Bryngelson, Henry Le Berre, Anand Radhakrishnan, Jefferey Young, Sunita Chandrasekaran, Florina Ciorba , et al. (6 additional authors not shown)

    Abstract: This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The syst… ▽ More

    Submitted 19 December, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  2. Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

    Authors: Jeffrey Kelling, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Matt Leinhauser, Richard Pausch, Klaus Steiniger, Jan Stephan, René Widera, Jeff Young, Michael Bussmann, Sunita Chandrasekaran, Guido Juckeland

    Abstract: HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the required portability, but, to existing codes, they thems… ▽ More

    Submitted 24 January, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: 20 pages, 1 figure, 3 tables, WACCPD@SC21

    ACM Class: D.1.3; D.2.1; D.3.3

  3. arXiv:2110.08221  [pdf, other

    cs.DC

    Metrics and Design of an Instruction Roofline Model for AMD GPUs

    Authors: Matthew Leinhauser, René Widera, Sergei Bastrakov, Alexander Debus, Michael Bussmann, Sunita Chandrasekaran

    Abstract: Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD architectures (CPU-GPU), which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs… ▽ More

    Submitted 10 November, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: 14 pages, 7 figures, 2 tables, 4 equations, explains how to create an instruction roofline model for an AMD GPU as of Oct. 2021

  4. Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

    Authors: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl

    Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes… ▽ More

    Submitted 19 January, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276

  5. LLAMA: The Low-Level Abstraction For Memory Access

    Authors: Bernhard Manfred Gruber, Guilherme Amadio, Jakob Blomer, Alexander Matthes, René Widera, Michael Bussmann

    Abstract: The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished v… ▽ More

    Submitted 9 March, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: 39 pages, 10 figures, 11 listings

    Journal ref: Softw Pract Exper. 2022; 1- 27

  6. Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library

    Authors: Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, Michael Bussmann

    Abstract: We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this e… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

    Comments: Accepted paper for the P\^{}3MA workshop at the ISC 2017 in Frankfurt

    Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 496-514, 2017

  7. arXiv:1706.00522  [pdf, other

    cs.PF physics.comp-ph

    On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

    Authors: Axel Huebl, Rene Widera, Felix Schmitt, Alexander Matthes, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Michael Bussmann

    Abstract: We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Comments: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'17

    ACM Class: D.4.8; B.4.3; I.6.6

    Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 15-29, 2017

  8. In situ, steerable, hardware-independent and data-structure agnostic visualization with ISAAC

    Authors: Alexander Matthes, Axel Huebl, René Widera, Sebastian Grottel, Stefan Gumhold, Michael Bussmann

    Abstract: The computation power of supercomputers grows faster than the bandwidth of their storage and network. Especially applications using hardware accelerators like Nvidia GPUs cannot save enough data to be analyzed in a later step. There is a high risk of loosing important scientific information. We introduce the in situ template library ISAAC which enables arbitrary applications like scientific simula… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Journal ref: Supercomputing Frontiers and Innovations, [S.l.], v. 3, n. 4, p. 30-48, oct. 2016

  9. Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

    Authors: Erik Zenker, René Widera, Axel Huebl, Guido Juckeland, Andreas Knüpfer, Wolfgang E. Nagel, Michael Bussmann

    Abstract: With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs,… ▽ More

    Submitted 12 June, 2016; v1 submitted 9 June, 2016; originally announced June 2016.

    Comments: 9 pages, 3 figures, accepted on IWOPH 2016

    Journal ref: Lecture Notes in Computer Science, 9945, pp 293-301, 2016

  10. Alpaka - An Abstraction Library for Parallel Kernel Acceleration

    Authors: Erik Zenker, Benjamin Worpitz, René Widera, Axel Huebl, Guido Juckeland, Andreas Knüpfer, Wolfgang E. Nagel, Michael Bussmann

    Abstract: Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model explo… ▽ More

    Submitted 26 February, 2016; originally announced February 2016.

    Comments: 10 pages, 10 figures