Search | arXiv e-print repository

DECICE: Device-Edge-Cloud Intelligent Collaboration Framework

Authors: Julian Kunkel, Christian Boehme, Jonathan Decker, Fabrizio Magugliani, Dirk Pleiter, Bastian Koller, Karthee Sivalingam, Sabri Pllana, Alexander Nikolov, Mujdat Soyturk, Christian Racca, Andrea Bartolini, Adrian Tate, Berkay Yaman

Abstract: DECICE is a Horizon Europe project that is develo** an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the Edge to large-scale Cloud / HPC computing infrastructures. In this paper, we describe the DECICE framework and architecture. Furthermore, we highlight use-cases f… ▽ More DECICE is a Horizon Europe project that is develo** an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the Edge to large-scale Cloud / HPC computing infrastructures. In this paper, we describe the DECICE framework and architecture. Furthermore, we highlight use-cases for framework evaluation: intelligent traffic intersection, magnetic resonance imaging, and emergency response. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2109.03592 [pdf, ps, other]

Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

Authors: Jonathan Vincent, **g Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson, Dirk Pleiter, Erwin Laure, Philipp Schlatter

Abstract: We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong… ▽ More We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for $Re_τ=550$ case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about $2000-5000$ elements per rank; compared to about $50-100$ for a CPU-rank. △ Less

Submitted 4 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 9 pages, 8 figures. Submitted to HPC-Asia 2022 conference, updated to address reviewers comments

ACM Class: G.4; J.2; C.1

arXiv:2102.05888 [pdf]

Brain Modelling as a Service: The Virtual Brain on EBRAINS

Authors: Michael Schirner, Lia Domide, Dionysios Perdikis, Paul Triebkorn, Leon Stefanovski, Roopa Pai, Paula Popa, Bogdan Valean, Jessica Palmer, Chloê Langford, André Blickensdörfer, Michiel van der Vlag, Sandra Diaz-Pier, Alexander Peyser, Wouter Klijn, Dirk Pleiter, Anne Nahm, Oliver Schmid, Marmaduke Woodman, Lyuba Zehl, Jan Fousek, Spase Petkoski, Lionel Kusch, Meysam Hashemi, Daniele Marinazzo , et al. (19 additional authors not shown)

Abstract: The Virtual Brain (TVB) is now available as open-source cloud ecosystem on EBRAINS, a shared digital research platform for brain science. It offers services for constructing, simulating and analysing brain network models (BNMs) including the TVB network simulator; magnetic resonance imaging (MRI) processing pipelines to extract structural and functional connectomes; multiscale co-simulation of spi… ▽ More The Virtual Brain (TVB) is now available as open-source cloud ecosystem on EBRAINS, a shared digital research platform for brain science. It offers services for constructing, simulating and analysing brain network models (BNMs) including the TVB network simulator; magnetic resonance imaging (MRI) processing pipelines to extract structural and functional connectomes; multiscale co-simulation of spiking and large-scale networks; a domain specific language for automatic high-performance code generation from user-specified models; simulation-ready BNMs of patients and healthy volunteers; Bayesian inference of epilepsy spread; data and code for mouse brain simulation; and extensive educational material. TVB cloud services facilitate reproducible online collaboration and discovery of data assets, models, and software embedded in scalable and secure workflows, a precondition for research on large cohort data sets, better generalizability and clinical translation. △ Less

Submitted 29 March, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

arXiv:2010.12195 [pdf, other]

Performance Evaluation of ParalleX Execution model on Arm-based Platforms

Authors: Nikunj Gupta, Rohit Ashiwal, Bine Brank, Sateesh K. Peddoju, Dirk Pleiter

Abstract: The HPC community shows a keen interest in creating diversity in the CPU ecosystem. The advent of Arm-based processors provides an alternative to the existing HPC ecosystem, which is primarily dominated by x86 processors. In this paper, we port an Asynchronous Many-Task runtime system based on the ParalleX model, i.e., High Performance ParalleX (HPX), and evaluate it on the Arm ecosystem with a su… ▽ More The HPC community shows a keen interest in creating diversity in the CPU ecosystem. The advent of Arm-based processors provides an alternative to the existing HPC ecosystem, which is primarily dominated by x86 processors. In this paper, we port an Asynchronous Many-Task runtime system based on the ParalleX model, i.e., High Performance ParalleX (HPX), and evaluate it on the Arm ecosystem with a suite of benchmarks. We wrote these benchmarks with an emphasis on vectorization and distributed scaling. We present the performance results on a variety of Arm processors and compare it with their x86 brethren from Intel. We show that the results obtained are equally good or better than their x86 brethren. Finally, we also discuss a few drawbacks of the present Arm ecosystem. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:1908.02702 [pdf, other]

doi 10.1007/978-3-030-34356-9_31

Performance Comparison for Neuroscience Application Benchmarks

Authors: Andreas Herten, Thorsten Hater, Wouter Klijn, Dirk Pleiter

Abstract: Researchers within the Human Brain Project and related projects have in the last couple of years expanded their needs for high-performance computing infrastructures. The needs arise from a diverse set of science challenges that range from large-scale simulations of brain models to processing of extreme-scale experimental data sets. The ICEI project, which is in the process of creating a distribute… ▽ More Researchers within the Human Brain Project and related projects have in the last couple of years expanded their needs for high-performance computing infrastructures. The needs arise from a diverse set of science challenges that range from large-scale simulations of brain models to processing of extreme-scale experimental data sets. The ICEI project, which is in the process of creating a distributed infrastructure optimised for brain research, started to build-up a set of benchmarks that reflect the diversity of applications in this field. In this paper we analyse the performance of some selected benchmarks on an IBM POWER8 and Intel Skylake based systems with and without GPUs. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: Presented at ISC19 Conference Workshop IWOPH (International Workshop on OpenPOWER for HPC)

arXiv:1901.07294 [pdf, other]

doi 10.1109/CLUSTER.2018.00079

SVE-enabling Lattice QCD Codes

Authors: Nils Meyer, Peter Georg, Dirk Pleiter, Stefan Solbrig, Tilo Wettig

Abstract: Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations through vector instructions. With the advent of the Scalable Vector Extension (SVE) ISA, future ARM-based processors are expected to provide a significant level of pa… ▽ More Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations through vector instructions. With the advent of the Scalable Vector Extension (SVE) ISA, future ARM-based processors are expected to provide a significant level of parallelism at this level. △ Less

Submitted 22 January, 2019; originally announced January 2019.

Comments: 6 pages

ACM Class: C.1.2

Journal ref: 2018 IEEE International Conference on Cluster Computing (CLUSTER), p. 623

arXiv:1807.03632 [pdf, other]

doi 10.1145/3203217.3205341

The SAGE Project: a Storage Centric Approach for Exascale Computing

Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Steven Wei-der Chien, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Shaun de Witt, Dirk Pleiter, Stefano Markidis

Abstract: SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r… ▽ More SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale regime. SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Julich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts. The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: Submitted to Computing Frontiers 2018. arXiv admin note: substantial text overlap with arXiv:1805.00556

arXiv:1805.00556 [pdf, other]

doi 10.1016/j.parco.2018.03.002

SAGE: Percipient Storage for Exascale Data Centric Computing

Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, Shaun de Witt

Abstract: We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras… ▽ More We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve. △ Less

Submitted 1 May, 2018; originally announced May 2018.

Journal ref: Parallel Computing, 23 March 2018

arXiv:1502.04025 [pdf, other]

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

Authors: Paul Arts, Jacques Bloch, Peter Georg, Benjamin Glaessle, Simon Heybrock, Yu Komatsubara, Robert Lohmayer, Simon Mages, Bernhard Mendl, Nils Meyer, Alessio Parcianello, Dirk Pleiter, Florian Rappl, Mauro Rossi, Stefan Solbrig, Giampietro Tecchiolli, Tilo Wettig, Gianpaolo Zanier

Abstract: We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks. We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks. △ Less

Submitted 13 February, 2015; originally announced February 2015.

Comments: plenary talk at Lattice 2014, to appear in the conference proceedings PoS(LATTICE2014), 15 pages, 9 figures

arXiv:0911.2174 [pdf, other]

QPACE -- a QCD parallel computer based on Cell processors

Authors: H. Baier, H. Boettiger, M. Drochner, N. Eicker, U. Fischer, Z. Fodor, A. Frommer, C. Gomez, G. Goldrian, S. Heybrock, D. Hierl, M. Hüsken, T. Huth, B. Krill, J. Lauritsen, T. Lippert, T. Maurer, B. Mendl, N. Meyer, A. Nobile, I. Ouda, M. Pivanti, D. Pleiter, M. Ries, A. Schäfer , et al. (10 additional authors not shown)

Abstract: QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very hig… ▽ More QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops. △ Less

Submitted 23 December, 2009; v1 submitted 11 November, 2009; originally announced November 2009.

Comments: 21 pages. Poster by T. Maurer and plenary talk by D. Pleiter presented at the "XXVII International Symposium on Lattice Field Theory", July 26-31 2009, Peking University, Bei**g, China. Information on recent Green500 ranking added and list of authors extended

Journal ref: PoS LAT2009:001,2009

Showing 1–10 of 10 results for author: Pleiter, D