Showing 1–2 of 2 results for author: Krajsek, K

Search v0.5.6 released 2020-02-24

arXiv:2108.11976 [pdf, other]

cs.DC cs.LG

JUWELS Booster -- A Supercomputer for Large-Scale AI Research

Authors: Stefan Kesselheim, Andreas Herten, Kai Krajsek, Jan Ebert, Jenia Jitsev, Mehdi Cherti, Michael Langguth, Bing Gong, Scarlet Stadtler, Amirpasha Mozaffari, Gabriele Cavallaro, Rocco Sedona, Alexander Schug, Alexandre Strube, Roshni Kamath, Martin G. Schultz, Morris Riedel, Thomas Lippert

Abstract: In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its s… ▽ More In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance. We exemplify its potential for research application by presenting large-scale AI research highlights from various scientific fields that require such a facility. △ Less

Submitted 30 June, 2021; originally announced August 2021.

Comments: 12 pages, 5 figures. Accepted at ISC 2021, Workshop Deep Learning on Supercomputers. This is a duplicate submission as my previous submission is on hold for several weeks now and my attempts to contact the moderators failed

Report number: 1234567Dummy
arXiv:2007.13552 [pdf, ps, other]

cs.DC cs.LG cs.MS

doi 10.1109/BigData50022.2020.9378050

HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics

Authors: Markus Götz, Daniel Coquelin, Charlotte Debus, Kai Krajsek, Claudia Comito, Philipp Knechtges, Björn Hagemeier, Michael Tarnawa, Simon Hanselmann, Martin Siggel, Achim Basermann, Achim Streit

Abstract: To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. dist… ▽ More To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. distributed memory architectures. To this end, we introduce HeAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI. It provides both low-level array computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take full advantage of their available resources, significantly lowering the barrier to distributed data analysis. When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude. △ Less

Submitted 11 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

Comments: 10 pages, 8 figures, 5 listings, 1 table

ACM Class: C.1.2; C.2.4; D.1.3; G.1.3; G.4; I.2.0; I.2.5; I.5.5

Search v0.5.6 released 2020-02-24