Search | arXiv e-print repository

arXiv:1904.07725 [pdf, other]

doi 10.1109/HPCC/SmartCity/DSS.2018.00046

The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architecture

Authors: Anke Kreuzer, Norbert Eicker, Jorge Amaya, Raphael Leger, Estela Suarez

Abstract: The recently completed research project DEEP-ER has developed a variety of hardware and software technologies to improve the I/O capabilities of next generation high-performance computers, and to enable applications recovering from the larger hardware failure rates expected on these machines. The heterogeneous Cluster-Booster architecture --first introduced in the predecessor DEEP project-- has… ▽ More The recently completed research project DEEP-ER has developed a variety of hardware and software technologies to improve the I/O capabilities of next generation high-performance computers, and to enable applications recovering from the larger hardware failure rates expected on these machines. The heterogeneous Cluster-Booster architecture --first introduced in the predecessor DEEP project-- has been extended by a multi-level memory hierarchy employing non-volatile and network-attached memory devices. Based on this hardware infrastructure, an I/O and resiliency software stack has been implemented combining and extending well established libraries and software tools, and sticking to standard user-interfaces. Real-world scientific codes have tested the projects' developments and demonstrated the improvements achieved without compromising the portability of the applications. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: 8 pages, 10 figures, HPCC conference. arXiv admin note: text overlap with arXiv:1904.05275

Journal ref: 2018 IEEE 20th International Conference on High Performance Computing and Communications (HPCC)

arXiv:1904.05275 [pdf, other]

doi 10.1109/IPDPSW.2018.00019

Application performance on a Cluster-Booster system

Authors: Anke Kreuzer, Jorge Amaya, Norbert Eicker, Estela Suarez

Abstract: The DEEP projects have developed a variety of hardware and software technologies aiming at improving the efficiency and usability of next generation high-performance computers. They evolve around an innovative concept for heterogeneous systems: the Cluster-Booster architecture. In it, a general purpose cluster is tightly coupled to a many-core system (the Booster). This modular way of integrating… ▽ More The DEEP projects have developed a variety of hardware and software technologies aiming at improving the efficiency and usability of next generation high-performance computers. They evolve around an innovative concept for heterogeneous systems: the Cluster-Booster architecture. In it, a general purpose cluster is tightly coupled to a many-core system (the Booster). This modular way of integrating heterogeneous components enables applications to freely choose the kind of computing resources on which it runs most efficiently. Codes might even be partitioned to map specific requirements of code-parts onto the best suited hardware. This paper presents for the first time measurements done by a real world scientific application demonstrating the performance gain achieved with this kind of code-partition approach. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 10 pages, 8 figures, IPDPS 2018 workshop HCW

Journal ref: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IPDPS, Vancouver, Canada, 21 May 2018 - 25 May 2018 IEEE 69 - 78 (2018)

arXiv:0911.2174 [pdf, other]

QPACE -- a QCD parallel computer based on Cell processors

Authors: H. Baier, H. Boettiger, M. Drochner, N. Eicker, U. Fischer, Z. Fodor, A. Frommer, C. Gomez, G. Goldrian, S. Heybrock, D. Hierl, M. Hüsken, T. Huth, B. Krill, J. Lauritsen, T. Lippert, T. Maurer, B. Mendl, N. Meyer, A. Nobile, I. Ouda, M. Pivanti, D. Pleiter, M. Ries, A. Schäfer , et al. (10 additional authors not shown)

Abstract: QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very hig… ▽ More QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops. △ Less

Submitted 23 December, 2009; v1 submitted 11 November, 2009; originally announced November 2009.

Comments: 21 pages. Poster by T. Maurer and plenary talk by D. Pleiter presented at the "XXVII International Symposium on Lattice Field Theory", July 26-31 2009, Peking University, Bei**g, China. Information on recent Green500 ranking added and list of authors extended

Journal ref: PoS LAT2009:001,2009

arXiv:hep-lat/0307015 [pdf, ps, other]

On the scaling of computational particle physics codes on cluster computers

Authors: Z. Sroczynski, N. Eicker, Th. Lippert, B. Orth, K. Schilling

Abstract: Many appplications in computational science are sufficiently compute-intensive that they depend on the power of parallel computing for viability. For all but the "embarrassingly parallel" problems, the performance depends upon the level of granularity that can be achieved on the computer platform. Our computational particle physics applications require machines that can support a wide range of… ▽ More Many appplications in computational science are sufficiently compute-intensive that they depend on the power of parallel computing for viability. For all but the "embarrassingly parallel" problems, the performance depends upon the level of granularity that can be achieved on the computer platform. Our computational particle physics applications require machines that can support a wide range of granularities, but in general, compute-intensive state-of-the-art projects will require finely grained distributions. Of the different types of machines available for the task, we consider cluster computers. The use of clusters of commodity computers in high performance computing has many advantages including the raw price/performance ratio and the flexibility of machine configuration and upgrade. Here we focus on what is usually considered the weak point of cluster technology; the scaling behaviour when faced with a numerically intensive parallel computation. To this end we examine the scaling of our own applications from numerical quantum field theory on a cluster and infer conclusions about the more general case. △ Less

Submitted 10 July, 2003; v1 submitted 9 July, 2003; originally announced July 2003.

Comments: 26pp. LaTeX2e using package graphicx. 16 PostScript figures

Report number: LTH 583

arXiv:cs/0303016 [pdf, ps, other]

Fast Parallel I/O on Cluster Computers

Authors: Thomas Duessel, Norbert Eicker, Florin Isaila, Thomas Lippert, Thomas Moschny, Hartmut Neff, Klaus Schilling, Walter Tichy

Abstract: Today's cluster computers suffer from slow I/O, which slows down I/O-intensive applications. We show that fast disk I/O can be achieved by operating a parallel file system over fast networks such as Myrinet or Gigabit Ethernet. In this paper, we demonstrate how the ParaStation3 communication system helps speed-up the performance of parallel I/O on clusters using the open source parallel virtua… ▽ More Today's cluster computers suffer from slow I/O, which slows down I/O-intensive applications. We show that fast disk I/O can be achieved by operating a parallel file system over fast networks such as Myrinet or Gigabit Ethernet. In this paper, we demonstrate how the ParaStation3 communication system helps speed-up the performance of parallel I/O on clusters using the open source parallel virtual file system (PVFS) as testbed and production system. We will describe the set-up of PVFS on the Alpha-Linux-Cluster-Engine (ALiCE) located at Wuppertal University, Germany. Benchmarks on ALiCE achieve write-performances of up to 1 GB/s from a 32-processor compute-partition to a 32-processor PVFS I/O-partition, outperforming known benchmark results for PVFS on the same network by more than a factor of 2. Read-performance from buffer-cache reaches up to 2.2 GB/s. Our benchmarks are giant, I/O-intensive eigenmode problems from lattice quantum chromodynamics, demonstrating stability and performance of PVFS over Parastation in large-scale production runs. △ Less

Submitted 19 March, 2003; originally announced March 2003.

Comments: 22 pages, 10 figures

ACM Class: B.4.3; C.1.2; C.2.2; D.4.3

Showing 1–5 of 5 results for author: Eicker, N