Skip to main content

Showing 1–12 of 12 results for author: Yamazaki, I

.
  1. arXiv:2402.15033  [pdf, other

    math.NA cs.DC

    Two-Stage Block Orthogonalization to Improve Performance of $s$-step GMRES

    Authors: Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

    Abstract: On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ basis vectors at a time, potentially reducing the communication cost by a factor of $s$. Unfortunately, for a large step size $s$, the solver can genera… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IPDPS'24

  2. arXiv:2309.05868  [pdf, ps, other

    math.NA

    Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching

    Authors: Andrew J. Higgins, Daniel B. Szyld, Erik G. Boman, Ichitaro Yamazaki

    Abstract: CholeskyQR2 and shifted CholeskyQR3 are two state-of-the-art algorithms for computing tall-and-skinny QR factorizations since they attain high performance on current computer architectures. However, to guarantee stability, for some applications, CholeskyQR2 faces a prohibitive restriction on the condition number of the underlying matrix to factorize. Shifted CholeskyQR3 is stable but has $50\%$ mo… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 27 pages

    MSC Class: 65F05; 65F20; 65F25; 65G50; 15B52

  3. arXiv:2304.04876  [pdf, other

    math.NA cs.DC cs.MS

    An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

    Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

    Abstract: The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlap** Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlap** Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the sol… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IPDPS'23

  4. arXiv:2109.01232  [pdf, other

    cs.DC cs.MS math.NA

    A Study of Mixed Precision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision stra… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.07544

  5. arXiv:2105.07544  [pdf, other

    math.NA cs.MS

    Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strat… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the IEEE IPDPS Accelerators and Hybrid Emerging Systems (AsHES) 11th Workshop, 2021

  6. arXiv:2104.01253  [pdf, other

    math.NA

    Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

    Authors: Daniel Bielich, Julien Langou, Stephen Thomas, Kasia Swirydowicz, Ichitaro Yamazaki, Erik G. Boman

    Abstract: The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying orthogonalization scheme is left-looking and processes one column at a time. Thus, at least one global reduction is required per iteration. The tradit… ▽ More

    Submitted 15 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: work is not ready yet, ongoing

  7. arXiv:2104.01196  [pdf, other

    math.NA cs.MS

    Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

    Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

    Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More

    Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  8. arXiv:2103.11991  [pdf, other

    cs.MS

    Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

    Authors: Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, Ichitaro Yamazaki

    Abstract: As hardware architectures are evolving in the push towards exascale, develo** Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to develo** a portable library of kernels that serve the needs of several CSE applications and software frame… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Report number: SAND2021-3421 O

  9. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4

  10. arXiv:1207.1773  [pdf

    math.NA

    A hybrid Hermitian general eigenvalue solver

    Authors: Raffaele Solcà, Thomas C. Schulthess, Azzam Haidar, Stanimire Tomov, Ichitaro Yamazaki, Jack Dongarra

    Abstract: The adoption of hybrid GPU-CPU nodes in traditional supercomputing platforms opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium sized Hermitian generalized eigenvalue problems must be solved many times. The small size of the problems limits the scalability on a distributed memory system, hence they can benefit from t… ▽ More

    Submitted 7 July, 2012; originally announced July 2012.

  11. arXiv:0708.3098  [pdf, other

    q-bio.GN

    CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads

    Authors: Sourav Chatterji, Ichitaro Yamazaki, Zhaojun Bai, Jonathan Eisen

    Abstract: A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly… ▽ More

    Submitted 22 August, 2007; originally announced August 2007.

  12. arXiv:hep-ex/0508026  [pdf, ps, other

    hep-ex physics.ins-det

    Efficient propagation of the polarization from laser photons to positrons through Compton scattering and electron-positron pair creation

    Authors: T. Omori, M. Fukuda, T. Hirose, Y. Kurihara, R. Kuroda, M. Nomura, A. Ohashi, T. Okugi, K. Sakaue, T. Saito, J. Urakawa, M. Washio, I. Yamazaki

    Abstract: We demonstrated for the first time the production of highly polarized short-pulse positrons with a finite energy spread in accordance with a new scheme that consists of two-quantum processes, such as inverse Compton scatterings and electron-positron pair creations. Using a circularly polarized laser beam of 532 nm scattered off a high-quality electron beam with the energy of 1.28 GeV, we obtaine… ▽ More

    Submitted 20 February, 2006; v1 submitted 11 August, 2005; originally announced August 2005.

    Comments: 9 pages, 6 figures

    Report number: KEK Preprint 2005-56

    Journal ref: Phys.Rev.Lett. 96 (2006) 114801