Skip to main content

Showing 1–11 of 11 results for author: Boman, E G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15033  [pdf, other

    math.NA cs.DC

    Two-Stage Block Orthogonalization to Improve Performance of $s$-step GMRES

    Authors: Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

    Abstract: On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ basis vectors at a time, potentially reducing the communication cost by a factor of $s$. Unfortunately, for a large step size $s$, the solver can genera… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IPDPS'24

  2. arXiv:2304.13194  [pdf, other

    cs.DC cs.DM

    Jet: Multilevel Graph Partitioning on Graphics Processing Units

    Authors: Michael S. Gilbert, Kamesh Madduri, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: The multilevel heuristic is the dominant strategy for high-quality sequential and parallel graph partitioning. Partition refinement is a key step of multilevel graph partitioning. In this work, we present Jet, a new parallel algorithm for partition refinement specifically designed for Graphics Processing Units (GPUs). We combine Jet with GPU-aware coarsening to develop a $k$-way graph partitioner,… ▽ More

    Submitted 5 January, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: To appear in SIAM SISC journal

  3. arXiv:2109.01232  [pdf, other

    cs.DC cs.MS math.NA

    A Study of Mixed Precision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision stra… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.07544

  4. arXiv:2107.00075  [pdf, other

    cs.DC cs.DM

    Parallel Graph Coloring Algorithms for Distributed GPU Environments

    Authors: Ian Bogle, Erik G Boman, Karen D Devine, Sivasankaran Rajamanickam, George M Slota

    Abstract: Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU or in distributed memory, but to the best of our knowledge, hybrid MPI+GPU algorithms have been unexplored until this work. We present several MPI+GP… ▽ More

    Submitted 30 June, 2021; originally announced July 2021.

    Comments: Submitted to Parallel Computing

  5. arXiv:2105.07544  [pdf, other

    math.NA cs.MS

    Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strat… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the IEEE IPDPS Accelerators and Hybrid Emerging Systems (AsHES) 11th Workshop, 2021

  6. arXiv:2105.00578  [pdf, other

    cs.DC cs.DM cs.MS

    Sphynx: a parallel multi-GPU graph partitioner for distributed-memory systems

    Authors: Seher Acer, Erik G Boman, Christian A Glusa, Sivasankaran Rajamanickam

    Abstract: Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partitioning becomes even more important as applications are rapidly moving to these architectures. However, there is no distributed-memory parallel, multi-G… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

    Comments: To appear in Parallel Computing

    Report number: SAND2021-0352-O MSC Class: 68W10

  7. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4

  8. arXiv:2005.12414  [pdf, other

    cs.DS

    On Optimal Partitioning For Sparse Matrices In Variable Block Row Format

    Authors: Willow Ahrens, Erik G. Boman

    Abstract: The Variable Block Row (VBR) format is an influential blocked sparse matrix format designed for matrices with shared sparsity structure between adjacent rows and columns. VBR groups adjacent rows and columns, storing the resulting blocks that contain nonzeros in a dense format. This reduces the memory footprint and enables optimizations such as register blocking and instruction-level parallelism… ▽ More

    Submitted 25 May, 2021; v1 submitted 25 May, 2020; originally announced May 2020.

    Comments: 22 pages; added experimental results for VBR, updated presentation of results

  9. arXiv:1808.08172  [pdf, other

    math.NA cs.DC

    Asynchronous One-Level and Two-Level Domain Decomposition Solvers

    Authors: Christian Glusa, Paritosh Ramanan, Erik G. Boman, Edmond Chow, Sivasankaran Rajamanickam

    Abstract: Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes on more heterogeneous systems make load balancing and network layout very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchr… ▽ More

    Submitted 10 August, 2020; v1 submitted 24 August, 2018; originally announced August 2018.

    MSC Class: 68W10; 65Y05; 68W15; 65N55

  10. arXiv:1712.07297  [pdf, other

    math.NA cs.MS

    A distributed-memory hierarchical solver for general sparse linear systems

    Authors: Chao Chen, Hadi Pouransari, Sivasankaran Rajamanickam, Erik G. Boman, Eric Darve

    Abstract: We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct s… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    MSC Class: 65F50

  11. arXiv:1505.00875  [pdf, other

    cs.DS

    Evaluating the Potential of a Dual Randomized Kaczmarz Solver for Laplacian Linear Systems

    Authors: Erik G. Boman, Kevin Deweese, John R. Gilbert

    Abstract: A new method for solving Laplacian linear systems proposed by Kelner et al. involves the random sampling and update of fundamental cycles in a graph. Kelner et al. proved asymptotic bounds on the complexity of this method but did not report experimental results. We seek to both evaluate the performance of this approach and to explore improvements to it in practice. We compare the performance of th… ▽ More

    Submitted 6 October, 2015; v1 submitted 4 May, 2015; originally announced May 2015.

    Comments: increased font size in figures for readability, added weak scaling figures, improved citations to application areas, changed terminology slightly from network graphs to irregular graphs