Skip to main content

Showing 1–5 of 5 results for author: Kurzak, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.01586  [pdf, other

    cs.DC

    Experiences Readying Applications for Exascale

    Authors: Paul T. Bauman, Reuben D. Budiardja, Dmytro Bykov, Noel Chalmers, Jacqueline Chen, Nicholas Curtis, Marc Day, Markus Eisenbach, Lucas Esclapez, Alessandro Fanfarillo, William Freitag, Nicholas Frontiere, Antigoni Georgiadou, Joseph Glenski, Kalyana Gottiparthi, Marc T. Henry de Frahan, Gustav R. Jansen, Wayne Joubert, Justin G. Lietz, Jakub Kurzak, Nicholas Malaya, Bronson Messer, Damon McDougall, Paul Mullowney, Stephen Nichols , et al. (7 additional authors not shown)

    Abstract: The advent of exascale computing invites an assessment of existing best practices for develo** application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programm… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted at SC23

  2. arXiv:2304.10397  [pdf, ps, other

    cs.DC math.NA

    Optimizing High-Performance Linpack for Exascale Accelerated Architectures

    Authors: Noel Chalmers, Jakub Kurzak, Damon McDougall, Paul T. Bauman

    Abstract: We detail the performance optimizations made in rocHPL, AMD's open-source implementation of the High-Performance Linpack (HPL) benchmark targeting accelerated node architectures designed for exascale systems such as the Frontier supercomputer. The implementation leverages the high-throughput GPU accelerators on the node via highly optimized linear algebra libraries, as well as the entire CPU socke… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  3. arXiv:1002.4057  [pdf, ps, other

    cs.MS math.NA

    Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

    Authors: Emmanuel Agullo, Henricus Bouwmeester, Jack Dongarra, Jakub Kurzak, Julien Langou, Lee Rosenberg

    Abstract: The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, and a QR fa… ▽ More

    Submitted 22 February, 2010; originally announced February 2010.

    Comments: 8 pages, extended abstract submitted to VecPar10 on 12/11/09, notification of acceptance received on 02/05/10. See: http://vecpar.fe.up.pt/2010/

  4. Accelerating Scientific Computations with Mixed Precision Algorithms

    Authors: Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, Stanimire Tomov

    Abstract: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here ca… ▽ More

    Submitted 20 August, 2008; originally announced August 2008.

  5. arXiv:0709.1272  [pdf, other

    cs.MS cs.DC

    A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

    Authors: Alfredo Buttari, Julien Langou, Jakub Kurzak, Jack Dongarra

    Abstract: As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an oper… ▽ More

    Submitted 12 June, 2008; v1 submitted 9 September, 2007; originally announced September 2007.

    Report number: Lapack working Note 191