Skip to main content

Showing 1–14 of 14 results for author: Swirydowicz, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.00232  [pdf, other

    cs.AR

    FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

    Authors: Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan

    Abstract: NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during c… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  2. arXiv:2401.13926  [pdf, other

    cs.CE eess.SY math.NA

    Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization

    Authors: Kasia Świrydowicz, Nicholson Koukpaizan, Maksudul Alam, Shaked Regev, Michael Saunders, Slaven Peleš

    Abstract: Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra methods are often inefficient. For example, methods for solving ill-conditioned linear systems have relied on conditional branching, which degrades performance on… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures, 5 tables

    MSC Class: 65F05; 65F10; 65F50; 65K10; 65Y05; 65Y10; 90C51

  3. arXiv:2306.14337  [pdf, other

    cs.CE

    GPU-Resident Sparse Direct Linear Solvers for Alternating Current Optimal Power Flow Analysis

    Authors: Kasia Świrydowicz, Nicholson Koukpaizan, Tobias Ribizel, Fritz Göbel, Shrirang Abhyankar, Hartwig Anzt, Slaven Peleš

    Abstract: Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of… ▽ More

    Submitted 15 August, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    MSC Class: 65F05; 65F10; 65F50; 65K10; 65Y05; 65Y10; 90C51

  4. Towards Efficient Alternating Current Optimal Power Flow Analysis on Graphical Processing Units

    Authors: Kasia Swirydowicz, Nicholson Koukpaizan, Shrirang Abhyankar, Slaven Peles

    Abstract: We present a solution of sparse alternating current optimal power flow (ACOPF) analysis on graphical processing unit (GPU). In particular, we discuss the performance bottlenecks and detail our efforts to accelerate the linear solver, a core component of ACOPF that dominates the computational time. ACOPF analyses of two large-scale systems, synthetic Northeast (25,000 buses) and Eastern (70,000 bus… ▽ More

    Submitted 5 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  5. arXiv:2112.14681  [pdf, other

    math.NA cs.MS

    Neumann Series in GMRES and Algebraic Multigrid Smoothers

    Authors: Stephen Thomas, Arielle Carr, Paul Mullowney, Ruipeng Li, Kasia Świrydowicz

    Abstract: Neumann series underlie both Krylov methods and algebraic multigrid smoothers. A low-synch modified Gram-Schmidt (MGS)-GMRES algorithm is described that employs a Neumann series to accelerate the projection step. A corollary to the backward stability result of Paige et al. (2006) demonstrates that the truncated Neumann series approximation is sufficient for convergence of GMRES. The lower triangul… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

  6. arXiv:2111.09512  [pdf, other

    math.NA cs.MS

    ILU Smoothers for Low Mach Navier-Stokes Pressure Solvers

    Authors: Stephen Thomas, Arielle Carr, Paul Mullowney, Kasia Świrydowicz, Marc Day

    Abstract: Incomplete LU (ILU) smoothers are effective in the algebraic multigrid (AMG) $V$-cycle for reducing high-frequency components of the error. However, the requisite direct triangular solves are comparatively slow on GPUs. Previous work has demonstrated the advantages of Jacobi iteration as an alternative to direct solution of these systems. Depending on the threshold and fill-level parameters chosen… ▽ More

    Submitted 27 November, 2023; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: v2 updated citation information; v3 updated results; v4 abstract updated, new results added; v5 new experimental analysis and results added; v6 shortened theory, improved discussion of applications; v7 final version

  7. arXiv:2110.03636  [pdf, other

    math.OC cs.DC

    A Hybrid Direct-Iterative Method for Solving KKT Linear Systems

    Authors: Shaked Regev, Nai-Yuan Chiang, Eric Darve, Cosmin G. Petra, Michael A. Saunders, Kasia Świrydowicz, Slaven Peleš

    Abstract: We propose a solution strategy for linear systems arising in interior method optimization, which is suitable for implementation on hardware accelerators such as graphical processing units (GPUs). The current gold standard for solving these systems is the LDL^T factorization. However, LDL^T requires pivoting during factorization, which substantially increases communication cost and degrades perform… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 22 pages, 9 figures, 7 tables

    MSC Class: 15; 65; 68 ACM Class: G.1

  8. arXiv:2109.04996  [pdf, other

    cs.DC cs.MS math.NA

    Efficient Exascale Discretizations: High-Order Finite Element Methods

    Authors: Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S. Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, Yu-Hsiang Lan, David Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W. Smith , et al. (5 additional authors not shown)

    Abstract: Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on u… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 22 pages, 18 figures

  9. Linear solvers for power grid optimization problems: a review of GPU-accelerated linear solvers

    Authors: Kasia Swirydowicz, Eric Darve, Wesley Jones, Jonathan Maack, Shaked Regev, Michael A. Saunders, Stephen J. Thomas, Slaven Peles

    Abstract: The linear equations that arise in interior methods for constrained optimization are sparse symmetric indefinite and become extremely ill-conditioned as the interior method converges. These linear systems present a challenge for existing solver frameworks based on sparse LU or LDL^T decompositions. We benchmark five well known direct linear solver packages using matrices extracted from power grid… ▽ More

    Submitted 13 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

  10. arXiv:2104.01196  [pdf, other

    math.NA cs.MS

    Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

    Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

    Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More

    Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  11. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4

  12. arXiv:2004.06722  [pdf, other

    cs.PF cs.DC

    Scalability of High-Performance PDE Solvers

    Authors: Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Swirydowicz, Jed Brown

    Abstract: Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: 25 pages, 54 figures

    MSC Class: 35-04 ACM Class: D.0; F.2; G.2; G.4; I.6

  13. arXiv:1801.00246  [pdf, other

    math.NA cs.DC cs.PF physics.comp-ph physics.flu-dyn

    A GPU Accelerated Discontinuous Galerkin Incompressible Flow Solver

    Authors: Ali Karakus, Noel Chalmers, Kasia Swirydowicz, Timothy Warburton

    Abstract: We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully G… ▽ More

    Submitted 7 May, 2018; v1 submitted 31 December, 2017; originally announced January 2018.

    Comments: 33 pages, 10 figures

  14. arXiv:1711.00903  [pdf, other

    cs.MS cs.DC cs.PF math.NA

    Acceleration of tensor-product operations for high-order finite element methods

    Authors: Kasia Świrydowicz, Noel Chalmers, Ali Karakus, Timothy Warburton

    Abstract: This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure… ▽ More

    Submitted 13 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: 31 pages, 11 figures