Skip to main content

Showing 1–6 of 6 results for author: Valero-Lara, P

.
  1. Julia as a unifying end-to-end workflow language on the Frontier exascale system

    Authors: William F. Godoy, Pedro Valero-Lara, Caira Anderson, Katrina W. Lee, Ana Gainaru, Rafael Ferreira da Silva, Jeffrey S. Vetter

    Abstract: We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational… ▽ More

    Submitted 27 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: 11 pages, 8 figures, accepted at the 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23), IEEE/ACM The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC23

  2. arXiv:2309.07103  [pdf, other

    cs.SE cs.AI cs.DC cs.PL

    Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

    Authors: Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

    Abstract: We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous wor… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted at LCPC 2023, The 36th International Workshop on Languages and Compilers for Parallel Computing http://www.lcpcworkshop.org/LCPC23/ . 13 pages, 5 figures, 1 table

  3. arXiv:2306.15121  [pdf, other

    cs.AI cs.ET cs.PL

    Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

    Authors: William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

    Abstract: We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offl… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted at the Sixteenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2023 to be held in conjunction with ICPP 2023: The 52nd International Conference on Parallel Processing. 10 pages, 6 figures, 5 tables

  4. Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

    Authors: William F. Godoy, Pedro Valero-Lara, T. Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G. Miller, Marc Gonzalez-Tallada, Jeffrey S. Vetter, Valentin Churavy

    Abstract: We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We comp… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at the 28th HIPS workshop, held in conjunction with IPDPS 2023. 10 pages, 9 figures

  5. cuConv: A CUDA Implementation of Convolution for CNN Inference

    Authors: Marc Jordà, Pedro Valero-Lara, Antonio J. Peña

    Abstract: Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper w… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    Journal ref: Cluster Comput (2022)

  6. MPI+OpenMP Tasking Scalability for Multi-Morphology Simulations of the Human Brain

    Authors: Pedro Valero-Lara, Raül Sirvent, Antonio J. Peña, Jesús Labarta

    Abstract: The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging target. In this work, we focus on the most important European initiative (the Human Brain Project) and on one of the models developed in this project.… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Journal ref: P. Valero-Lara, R. Sirvent, A. J. Peña, and J. Labarta. "MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain", Parallel Computing, Elsevier, vol. 84, pp. 50-61, May 2019