Skip to main content

Showing 1–9 of 9 results for author: Reguly, I Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.10075  [pdf, other

    cs.PF cs.DC

    Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

    Authors: Istvan Z Reguly

    Abstract: In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three major vendors; we evaluate performance on the Intel(R) Data Center GPU Max 1100, the NVIDIA A100 GPU, and the AMD MI250X GPU. Support on CPUs curr… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  2. arXiv:2309.09084  [pdf, other

    cs.PF cs.DC

    Comparative evaluation of bandwidth-bound applications on the Intel Xeon CPU MAX Series

    Authors: Istvan Z Reguly

    Abstract: In this paper we explore the performance of Intel Xeon MAX CPU Series, representing the most significant new variation upon the classical CPU architecture since the Intel Xeon Phi Processor. Given the availability of a large on-package high-bandwidth memory, the bandwidth-to-compute ratio has significantly shifted compared to other CPUs on the market. Since a large fraction of HPC workloads are se… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  3. arXiv:2201.03950  [pdf, other

    cs.DC cs.AR

    High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

    Authors: Kamalavasan Kamalakkannan, Istvan Z. Reguly, Suhaib A. Fahmy, Gihan R. Mudalige

    Abstract: We present a design space exploration for synthesizing optimized, high-throughput implementations of multiple multi-dimensional tridiagonal system solvers on FPGAs. Re-evaluating the characteristics of algorithms for the direct solution of tridiagonal systems, we develop a new tridiagonal solver library aimed at implementing high-performance computing applications on Xilinx FPGA hardware. Key new… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

    Comments: Under review

  4. arXiv:2101.01177  [pdf, other

    cs.AR cs.DC cs.PF

    High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

    Authors: Kamalavasan Kamalakkannan, Gihan R. Mudalige, Istvan Z. Reguly, Suhaib A. Fahmy

    Abstract: This paper presents a workflow for synthesizing near-optimal FPGA implementations for structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class, its computation-communication pattern, and the architectural capabilities of the FPGA to accelerate solvers from the high-performance computing domain. Key new features of the workflow are (… ▽ More

    Submitted 7 January, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: Preprint - Accepted to the 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2021), May 2021, Portland, Oregon USA

  5. arXiv:1802.03749  [pdf, other

    cs.MS cs.DC

    Locality Optimized Unstructured Mesh Algorithms on GPUs

    Authors: András Attila Sulyok, Gábor Dániel Balogh, István Zoltán Reguly, Gihan R. Mudalige

    Abstract: Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism an… ▽ More

    Submitted 27 July, 2019; v1 submitted 11 February, 2018; originally announced February 2018.

    Comments: Number of pages: 36 Number of figures: 21 Submitted to JPDC

  6. arXiv:1711.01845  [pdf, other

    cs.PF

    Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

    Authors: G. D. Balogh, I. Z. Reguly, G. R. Mudalige

    Abstract: Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting diff… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  7. arXiv:1709.02125  [pdf, other

    cs.DC

    Beyond 16GB: Out-of-Core Stencil Computations

    Authors: Istvan Z Reguly, Gihan R Mudalige, Michael B Giles

    Abstract: Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such architectures come with a limited amount of fast memory, which is limiting the size of the problems that can be efficiently solved. In this paper, we addres… ▽ More

    Submitted 26 October, 2017; v1 submitted 7 September, 2017; originally announced September 2017.

  8. Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

    Authors: Istvan Z Reguly, Gihan R Mudalige, Mike B Giles

    Abstract: The key common bottleneck in most stencil codes is data movement, and prior research has shown that improving data locality through optimisations that schedule across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through compilers because there are many options, execution paths and data per grid point, many dependent on run-time… ▽ More

    Submitted 26 June, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

  9. Acceleration of a Full-scale Industrial CFD Application with OP2

    Authors: István Z. Reguly, Gihan R. Mudalige, Carlo Bertolli, Michael B. Giles, Adam Betts, Paul H. J. Kelly, David Radford

    Abstract: Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applications, which feature high speed-ups when accelerated by modern processor archit… ▽ More

    Submitted 27 March, 2014; originally announced March 2014.

    Comments: Submitted to ACM Transactions on Parallel Computing

    ACM Class: C.4

    Journal ref: IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 5, pp. 1265-1278, May 1 2016. doi: 10.1109/TPDS.2015.2453972