Skip to main content

Showing 1–11 of 11 results for author: Peña, A J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.13002  [pdf, other

    cs.DC

    ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

    Authors: Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

    Abstract: Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equa… ▽ More

    Submitted 23 June, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  2. A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code

    Authors: Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

    Abstract: Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method that easily enables parallel computing by just adhering code annotations to code loops. Such abstract models, however, often prevent programmers from… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: To appear in: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction (CC '23)

  3. JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

    Authors: Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña

    Abstract: The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential… ▽ More

    Submitted 27 April, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Extended version of a paper to appear in: Proceedings of the 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), December 17-18, 2021

  4. arXiv:2106.12485  [pdf, other

    cs.DC physics.comp-ph physics.plasm-ph

    Particle-In-Cell Simulation using Asynchronous Tasking

    Authors: Nicolas Guidotti, Pedro Ceyrat, João Barreto, José Monteiro, Rodrigo Rodrigues, Ricardo Fonseca, Xavier Martorell, Antonio J. Peña

    Abstract: Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks. However, tasking models are yet to be widely adopted by the HPC community and their effective advantages wh… ▽ More

    Submitted 29 August, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Published on the 27th European Conference on Parallel and Distributed Computing (Euro-Par 2021)

    Journal ref: Euro-Par 2021: Parallel Processing. Lecture Notes in Computer Science, vol 12820, pp. 482-498

  5. cuConv: A CUDA Implementation of Convolution for CNN Inference

    Authors: Marc Jordà, Pedro Valero-Lara, Antonio J. Peña

    Abstract: Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper w… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    Journal ref: Cluster Comput (2022)

  6. arXiv:2103.16139  [pdf, other

    cs.CR cs.LG cs.PF

    Enabling Homomorphically Encrypted Inference for Large DNN Models

    Authors: Guillermo Lloret-Talavera, Marc Jorda, Harald Servat, Fabian Boemer, Chetan Chauhan, Shigeki Tomishima, Nilesh N. Shah, Antonio J. Peña

    Abstract: The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x-10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources, with frameworks requiring hundreds of gigabytes of DRAM to evalua… ▽ More

    Submitted 29 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Manuscript accepted for publication in IEEE Transactions on Computers

  7. MPI+OpenMP Tasking Scalability for Multi-Morphology Simulations of the Human Brain

    Authors: Pedro Valero-Lara, Raül Sirvent, Antonio J. Peña, Jesús Labarta

    Abstract: The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging target. In this work, we focus on the most important European initiative (the Human Brain Project) and on one of the models developed in this project.… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Journal ref: P. Valero-Lara, R. Sirvent, A. J. Peña, and J. Labarta. "MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain", Parallel Computing, Elsevier, vol. 84, pp. 50-61, May 2019

  8. DMR API: Improving cluster productivity by turning applications into malleable

    Authors: Sergio Iserte, Rafael Mayo, Enrique S. Quintana-Orti, Vicenc Beltran, Antonio J. Peña

    Abstract: Adaptive workloads can change on--the--fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime, to change its number of MPI ranks. The collaboration between both the workload manager---aware of the queue of jobs and the… ▽ More

    Submitted 28 May, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Journal ref: S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Peña, "DMR API: Improving cluster productivity by turning applications into malleable", Parallel Computing, Elsevier, vol. 78, pp. 54-66, Oct. 2018

  9. Understanding Memory Access Patterns Using the BSC Performance Tools

    Authors: Harald Servat, Jesús Labarta, Hans-Christian Hoppe, Judit Giménez, Antonio J. Peña

    Abstract: The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have been recently extended to provide insight relative to the application memory accesses depicting their temporal and spatial characteristics, correlating… ▽ More

    Submitted 28 May, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Journal ref: H. Servat, J. Labarta, H. C. Hoppe, J. Giménez, and A. J. Peña, "Understanding memory access patterns using the BSC performance tools", Parallel Computing, Elsevier, vol. 78, pp. 1-14, Oct. 2018

  10. Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models

    Authors: Kevin Sala, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Vicenç Beltran, Jesus Labarta

    Abstract: In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both programmability and performance of hybrid applications. The first API allows to pause and resume the execution of a task depending on external events. This API is used to improv… ▽ More

    Submitted 29 May, 2020; v1 submitted 10 January, 2019; originally announced January 2019.

    Comments: European Commission's projects: INTERTWinE (EC-H2020-671602), Marie Skłodowska-Curie (EC-H2020-749516). Postprint submitted to the Parallel Computing Journal (Elsevier). Figures from section 7.2 updated, typos corrected

    Journal ref: Parallel Computing, 85, 153-166 (2019)

  11. arXiv:1810.04150  [pdf, other

    cs.DC

    Exploring the Vision Processing Unit as Co-processor for Inference

    Authors: Sergio Rivas-Gomez, Antonio J. Peña, David Moloney, Erwin Laure, Stefano Markidis

    Abstract: The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore t… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.