Showing 1–2 of 2 results for author: Worpitz, B
-
Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library
Authors:
Alexander Matthes,
René Widera,
Erik Zenker,
Benjamin Worpitz,
Axel Huebl,
Michael Bussmann
Abstract:
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this e…
▽ More
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidia's Tesla P100, Intel's Knights Landing (KNL) and Haswell architecture as well as IBM's Power8 system. On some of these we are able to reach almost 50\% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we are able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Alpaka - An Abstraction Library for Parallel Kernel Acceleration
Authors:
Erik Zenker,
Benjamin Worpitz,
René Widera,
Axel Huebl,
Guido Juckeland,
Andreas Knüpfer,
Wolfgang E. Nagel,
Michael Bussmann
Abstract:
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform.
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model explo…
▽ More
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform.
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits parallelism and memory hierarchies on a node at all levels available in current hardware. By doing so, it allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. All hardware types (multi- and many-core CPUs, GPUs and other accelerators) are supported for and can be programmed in the same way. The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization.
Running Alpaka applications on a new (and supported) platform requires the change of only one source code line instead of a lot of \#ifdefs.
△ Less
Submitted 26 February, 2016;
originally announced February 2016.