Skip to main content

Showing 1–6 of 6 results for author: Oryspayev, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.04797  [pdf, other

    cs.DC cs.PF

    CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion

    Authors: Jan Laukemann, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein

    Abstract: In this paper we analyze the MPI-only version of the CloverLeaf code from the SPEChpc 2021 benchmark suite on recent Intel Xeon "Ice Lake" and "Sapphire Rapids" server CPUs. We observe peculiar breakdowns in performance when the number of processes is prime. Investigating this effect, we create first-principles data traffic models for each of the stencil-like hotspot loops. With application measur… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 19 pages including artifact appendix; 11 figures, 1 table; numerous corrections, esp. in Table 1

  2. arXiv:2110.10765  [pdf, other

    cs.DC cs.CE cs.MS cs.PF nucl-th

    Accelerating quantum many-body configuration interaction with directives

    Authors: Brandon Cook, Patrick J. Fasano, Pieter Maris, Chao Yang, Dossay Oryspayev

    Abstract: Many-Fermion Dynamics-nuclear, or MFDn, is a configuration interaction (CI) code for nuclear structure calculations. It is a platform-independent Fortran 90 code using a hybrid MPI+X programming model. For CPU platforms the application has a robust and optimized OpenMP implementation for shared memory parallelism. As part of the NESAP application readiness program for NERSC's latest Perlmutter sys… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: 22 pages, 7 figures, 11 code listings, WACCPD@SC21

  3. arXiv:2109.00485  [pdf, other

    cs.DC cs.MS math.NA nucl-th

    Accelerating an Iterative Eigensolver for Nuclear Structure Configuration Interaction Calculations on GPUs using OpenACC

    Authors: Pieter Maris, Chao Yang, Dossay Oryspayev, Brandon Cook

    Abstract: To accelerate the solution of large eigenvalue problems arising from many-body calculations in nuclear physics on distributed-memory parallel systems equipped with general-purpose Graphic Processing Units (GPUs), we modified a previously developed hybrid MPI/OpenMP implementation of an eigensolver written in FORTRAN 90 by using an OpenACC directives based programming model. Such an approach requir… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: 26 pages, 13 figures

  4. arXiv:2107.10346  [pdf, other

    cs.MS

    Comparing OpenMP Implementations With Applications Across A64FX Platforms

    Authors: Benjamin Michalowicz, Eric Raut, Yan Kang, Tony Curtis, Barbara Chapman, Dossay Oryspayev

    Abstract: The development of the A64FX processor by Fujitsu has created a massive innovation in High-Performance Computing and the birth of Fugaku: the current world's fastest supercomputer. A variety of tools are used to analyze the run-times and performances of several applications, and in particular, how these applications scale on the A64FX processor. We examine the performance and behavior of applicati… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  5. Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms

    Authors: Benjamin Michalowicz, Eric Raut, Yan Kang, Tony Curtis, Barbara Chapman, Dossay Oryspayev

    Abstract: The development of the A64FX processor by Fujitsu has been a massive innovation in vectorized processors and led to Fugaku: the current world's fastest supercomputer. We use a variety of tools to analyze the behavior and performance of several OpenMP applications with different compilers, and how these applications scale on the different A64FX processors on clusters at Stony Brook University and R… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  6. Ookami: Deployment and Initial Experiences

    Authors: Andrew Burford, Alan C. Calder, David Carlson, Barbara Chapman, Firat CoŞKun, Tony Curtis, Catherine Feldman, Robert J. Harrison, Yan Kang, Benjamin Michalow-Icz, Eric Raut, Eva Siegmann, Daniel G. Wood, Robert L. Deleon, Mathew Jones, Nikolay A. Simakov, Joseph P. White, Dossay Oryspayev

    Abstract: Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vec… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 14 pages, 7 figures, PEARC '21: Practice and Experience in Advanced Research Computing, July 18--22, 2021, Boston, MA, USA