Skip to main content

Showing 1–9 of 9 results for author: Yamazaki, I

Searching in archive math. Search in all archives.
.
  1. arXiv:2402.15033  [pdf, other

    math.NA cs.DC

    Two-Stage Block Orthogonalization to Improve Performance of $s$-step GMRES

    Authors: Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

    Abstract: On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ basis vectors at a time, potentially reducing the communication cost by a factor of $s$. Unfortunately, for a large step size $s$, the solver can genera… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IPDPS'24

  2. arXiv:2309.05868  [pdf, ps, other

    math.NA

    Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching

    Authors: Andrew J. Higgins, Daniel B. Szyld, Erik G. Boman, Ichitaro Yamazaki

    Abstract: CholeskyQR2 and shifted CholeskyQR3 are two state-of-the-art algorithms for computing tall-and-skinny QR factorizations since they attain high performance on current computer architectures. However, to guarantee stability, for some applications, CholeskyQR2 faces a prohibitive restriction on the condition number of the underlying matrix to factorize. Shifted CholeskyQR3 is stable but has $50\%$ mo… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 27 pages

    MSC Class: 65F05; 65F20; 65F25; 65G50; 15B52

  3. arXiv:2304.04876  [pdf, other

    math.NA cs.DC cs.MS

    An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

    Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

    Abstract: The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlap** Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlap** Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the sol… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IPDPS'23

  4. arXiv:2109.01232  [pdf, other

    cs.DC cs.MS math.NA

    A Study of Mixed Precision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision stra… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.07544

  5. arXiv:2105.07544  [pdf, other

    math.NA cs.MS

    Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strat… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the IEEE IPDPS Accelerators and Hybrid Emerging Systems (AsHES) 11th Workshop, 2021

  6. arXiv:2104.01253  [pdf, other

    math.NA

    Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers

    Authors: Daniel Bielich, Julien Langou, Stephen Thomas, Kasia Swirydowicz, Ichitaro Yamazaki, Erik G. Boman

    Abstract: The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying orthogonalization scheme is left-looking and processes one column at a time. Thus, at least one global reduction is required per iteration. The tradit… ▽ More

    Submitted 15 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: work is not ready yet, ongoing

  7. arXiv:2104.01196  [pdf, other

    math.NA cs.MS

    Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

    Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

    Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More

    Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  8. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4

  9. arXiv:1207.1773  [pdf

    math.NA

    A hybrid Hermitian general eigenvalue solver

    Authors: Raffaele Solcà, Thomas C. Schulthess, Azzam Haidar, Stanimire Tomov, Ichitaro Yamazaki, Jack Dongarra

    Abstract: The adoption of hybrid GPU-CPU nodes in traditional supercomputing platforms opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium sized Hermitian generalized eigenvalue problems must be solved many times. The small size of the problems limits the scalability on a distributed memory system, hence they can benefit from t… ▽ More

    Submitted 7 July, 2012; originally announced July 2012.