Skip to main content

Showing 1–21 of 21 results for author: Navarro, C A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17284  [pdf, other

    cs.DC

    CAT: Cellular Automata on Tensor cores

    Authors: Cristóbal A. Navarro, Felipe A. Quezada, Enzo Meneses, Héctor Ferrada, Nancy Hitschfeld

    Abstract: Cellular automata (CA) are simulation models that can produce complex emergent behaviors from simple local rules. Although state-of-the-art GPU solutions are already fast due to their data-parallel nature, their performance can rapidly degrade in CA with a large neighborhood radius. With the inclusion of tensor cores across the entire GPU ecosystem, interest has grown in finding ways to leverage t… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 15 pages

  2. arXiv:2306.10959  [pdf, other

    cs.CV cs.AI cs.LG

    RaViTT: Random Vision Transformer Tokens

    Authors: Felipe A. Quezada, Carlos F. Navarro, Cristian Muñoz, Manuel Zamorano, Jorge Jara-Wilde, Violeta Chang, Cristóbal A. Navarro, Mauricio Cerda

    Abstract: Vision Transformers (ViTs) have successfully been applied to image classification problems where large annotated datasets are available. On the other hand, when fewer annotations are available, such as in biomedical applications, image augmentation techniques like introducing image variations or combinations have been proposed. However, regarding ViT patch sampling, less has been explored outside… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: 9 pages, 6 figures

    MSC Class: 68T07

  3. arXiv:2306.03282  [pdf, other

    cs.DC cs.CG cs.DS

    Accelerating Range Minimum Queries with Ray Tracing Cores

    Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada, Felipe A. Quezada

    Abstract: During the last decade GPU technology has shifted from pure general purpose computation to the inclusion of application specific integrated circuits (ASICs), such as Tensor Cores and Ray Tracing (RT) cores. Although these special purpose GPU cores were designed to further accelerate specific fields such as AI and real-time rendering, recent research has managed to exploit them to further accelerat… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 17 Figures

  4. arXiv:2303.10581  [pdf, other

    cs.DC cs.CG

    An Evaluation of GPU Filters for Accelerating the 2D Convex Hull

    Authors: Roberto Carrasco, Héctor Ferrada, Cristóbal A. Navarro, Nancy Hitschfeld

    Abstract: The Convex Hull algorithm is one of the most important algorithms in computational geometry, with many applications such as in computer graphics, robotics, and data mining. Despite the advances in the new algorithms in this area, it is often needed to improve the performance to solve more significant problems quickly or in real-time processing. This work presents an experimental evaluation of GPU… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

  5. arXiv:2209.12310  [pdf, other

    cs.DC

    Accelerating the Convex Hull Computation with a Parallel GPU Algorithm

    Authors: Alan Keith, Héctor Ferrada, Cristóbal A. Navarro

    Abstract: The convex hull is a fundamental geometrical structure for many applications where groups of points must be enclosed or represented by a convex polygon. Although efficient sequential convex hull algorithms exist, and are constantly being used in applications, their computation time is often considered an issue for time-sensitive tasks such as real-time collision detection, clustering or image proc… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: 7 pages, in Spanish language

  6. arXiv:2209.00117  [pdf, other

    cs.DC

    GPU Voronoi Diagrams for Random Moving Seeds

    Authors: Rodrigo Stevenson, Cristóbal A. Navarro

    Abstract: The Voronoi Diagram is a geometrical structure that is widely used in scientific or technological applications where proximity is a relevant aspect to consider, and it also resembles natural phenomena such as cellular banks, rock formations or bee hives, among others. Typically, computing the Voronoi Diagram is done in a static context, that is, the location of the input seeds is defined once and… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: 6 pages

  7. arXiv:2209.00103  [pdf, other

    cs.DC

    GGArray: A Dynamically Growable GPU Array

    Authors: Enzo Meneses, Cristóbal A. Navarro, Héctor Ferrada

    Abstract: We present a dynamically Growable GPU array (GGArray) fully implemented in GPU that does not require synchronization with the host. The idea is to improve the programming of GPU applications that require dynamic memory, by offering a structure that does not require pre-allocating GPU VRAM for the worst case scenario. The GGArray is based on the LFVector, by utilizing an array of them in order to t… ▽ More

    Submitted 7 September, 2022; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: 8 pages

  8. arXiv:2208.11617  [pdf, other

    cs.DC cs.DM

    A Scalable and Energy Efficient GPU Thread Map for m-Simplex Domains

    Authors: Cristóbal A. Navarro, Felipe A. Quezada, Benjamin Bustos, Nancy Hitschfeld, Rolando Kindelan

    Abstract: This work proposes a new GPU thread map for $m$-simplex domains, that scales its speedup with dimension and is energy efficient compared to other state of the art approaches. The main contributions of this work are i) the formulation of the new block-space map $\mathcal{H}: \mathbb{Z}^m \mapsto \mathbb{Z}^m$ for regular orthogonal simplex domains, which is analyzed in terms of resource usage, and… ▽ More

    Submitted 12 September, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: 13 pages

  9. arXiv:2206.02255  [pdf, other

    cs.DC cs.PF

    Modeling GPU Dynamic Parallelism for Self Similar Density Workloads

    Authors: Felipe A. Quezada, Cristóbal A. Navarro, Miguel Romero, Cristhian Aguilera

    Abstract: Dynamic Parallelism (DP) is a runtime feature of the GPU programming model that allows GPU threads to execute additional GPU kernels, recursively. Apart from making the programming of parallel hierarchical patterns easier, DP can also speedup problems that exhibit a heterogeneous data layout by focusing, through a subdivision process, the finite GPU resources on the sub-regions that exhibit more p… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: submitted to Journal

  10. arXiv:2201.00613  [pdf, other

    cs.DC cs.CG cs.DM

    Squeeze: Efficient Compact Fractals for Tensor Core GPUs

    Authors: Felipe A. Quezada, Cristóbal A. Navarro, Nancy Hitschfeld, Benjamin Bustos

    Abstract: This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between compact and expanded forms, one can do data-parallel computation on a fractal with neighborhood access without needing to expand the fractal in memory. The space transformations are formulated as two GPU tensor-core accelerated thread maps, $λ(ω)$ and… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  11. arXiv:2110.12952  [pdf, other

    cs.DC

    Accelerating Compact Fractals with Tensor Core GPUs

    Authors: Felipe A. Quezada, Cristóbal A. Navarro

    Abstract: This work presents a GPU thread map** approach that allows doing fast parallel stencil-like computations on discrete fractals using their compact representation. The intuition behind is to employ two GPU tensor-core accelerated thread maps, $λ(ω)$ and $ν(ω)$, which act as threadspace-to-dataspace and dataspace-to-threadspace functions, respectively. By combining these maps, threads can access co… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: Tech Report

  12. arXiv:2004.13475  [pdf, other

    cs.DC

    Efficient GPU Thread Map** on Embedded 2D Fractals

    Authors: Cristóbal A. Navarro, Felipe A. Quezada, Nancy Hitschfeld, Raimundo Vega, Benjamin Bustos

    Abstract: This work proposes a new approach for map** GPU threads onto a family of discrete embedded 2D fractals. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads wit… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

    Comments: 20 Pages. arXiv admin note: text overlap with arXiv:1706.04552

    ACM Class: C.1.4; G.2.0

  13. arXiv:2001.05585  [pdf, ps, other

    cs.DC

    GPU Tensor Cores for fast Arithmetic Reductions

    Authors: Cristóbal A. Navarro, Roberto Carrasco, Ricardo J. Barrientos, Javier A. Riquelme, Raimundo Vega

    Abstract: This work proposes a GPU tensor core approach that encodes the arithmetic reduction of $n$ numbers as a set of chained $m \times m$ matrix multiply accumulate (MMA) operations executed in parallel by GPU tensor cores. The asymptotic running time of the proposed chained tensor core approach is $T(n)=5 log_{m^2}{n}$ and its speedup is $S=\dfrac{4}{5} log_{2}{m^2}$ over the classic $O(n \log n)$ para… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: 14 pages, 11 figures

  14. Analyzing GPU Tensor Core Potential for Fast Reductions

    Authors: Roberto Carrasco, Raimundo Vega, Cristóbal A. Navarro

    Abstract: The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep Learning} applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

    Comments: This paper was presented in the SCCC 2018 Conference, November 5

    Journal ref: 37th Internatioinal Conference of the Chilean Computer Science Society, SCCC 2018, November 5-9, Santiago, Chile, 2018

  15. arXiv:1706.04552  [pdf, ps, other

    cs.DC

    Block-space GPU Map** for Embedded Sierpiński Gasket Fractals

    Authors: Cristóbal A. Navarro, Benjamín Bustos, Raimundo Vega, Nancy Hitschfeld

    Abstract: This work studies the problem of GPU thread map** for a Sierpiński gasket fractal embedded in a discrete Euclidean space of $n \times n$. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 7 pages, 8 Figures

  16. arXiv:1610.07394  [pdf, other

    cs.DC

    Possibilities of Recursive GPU Map** for Discrete Orthogonal Simplices

    Authors: Cristóbal A. Navarro, Benjamín Bustos, Nancy Hitscheld

    Abstract: The problem of parallel thread map** is studied for the case of discrete orthogonal $m$-simplices. The possibility of a $O(1)$ time recursive block-space map $λ: \mathbb{Z}^m \mapsto \mathbb{Z}^m$ is analyzed from the point of view of parallel space efficiency and potential performance improvement. The $2$-simplex and $3$-simplex are analyzed as special cases, where constant time maps are found,… ▽ More

    Submitted 24 October, 2016; originally announced October 2016.

  17. arXiv:1609.01490  [pdf, ps, other

    cs.DC

    A Non-linear GPU Thread Map for Triangular Domains

    Authors: Cristóbal A. Navarro, Benjamín Bustos, Nancy Hitschfeld

    Abstract: There is a stage in the GPU computing pipeline where a grid of thread-blocks, in \textit{parallel space}, is mapped onto the problem domain, in \textit{data space}. Since the parallel space is restricted to a box type geometry, the map** approach is typically a $k$-dimensional bounding box (BB) that covers a $p$-dimensional data space. Threads that fall inside the domain perform computations whi… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: 16 pages, 7 Figures

  18. arXiv:1606.08881  [pdf, ps, other

    cs.DC

    Potential benefits of a block-space GPU approach for discrete tetrahedral domains

    Authors: Cristóbal A. Navarro, Benjamín Bustos, Nancy Hitschfeld

    Abstract: The study of data-parallel domain re-organization and thread-map** techniques are relevant topics as they can increase the efficiency of GPU computations when working on spatial discrete domains with non-box-shaped geometry. In this work we study the potential benefits of applying a succint data re-organization of a tetrahedral data-parallel domain of size $\mathcal{O}(n^3)$ combined with an eff… ▽ More

    Submitted 28 June, 2016; originally announced June 2016.

  19. arXiv:1508.06268  [pdf, other

    physics.comp-ph cond-mat.stat-mech cs.DC

    Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

    Authors: C. A. Navarro, Wei Huang, You** Deng

    Abstract: We present an adaptive multi-GPU Exchange Monte Carlo method designed for the simulation of the 3D Random Field Model. The algorithm design is based on a two-level parallelization scheme that allows the method to scale its performance in the presence of faster and GPUs as well as multiple GPUs. The set of temperatures is adapted according to the exchange rate observed from short trial runs, leadin… ▽ More

    Submitted 22 September, 2015; v1 submitted 25 August, 2015; originally announced August 2015.

    Comments: 15 pages, 10 figures

    Journal ref: Computer Physics Communications, Volume 205, August 2016, pp 48-60

  20. arXiv:1308.1419  [pdf, ps, other

    cs.DC cs.AR

    Improving the GPU space of computation under triangular domain problems

    Authors: Cristobal A. Navarro, Nancy Hitschfeld

    Abstract: There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with non-square geometry, this is not always the best idea be… ▽ More

    Submitted 6 August, 2013; originally announced August 2013.

    Comments: 6 pages, 9 Figures

  21. arXiv:1305.6325  [pdf, ps, other

    physics.comp-ph cond-mat.stat-mech cs.DC

    Multi-core computation of transfer matrices for strip lattices in the Potts model

    Authors: Cristobal A. Navarro, Fabrizio Canfora, Nancy Hitschfeld Kahler

    Abstract: The transfer-matrix technique is a convenient way for studying strip lattices in the Potts model since the compu- tational costs depend just on the periodic part of the lattice and not on the whole. However, even when the cost is reduced, the transfer-matrix technique is still an NP-hard problem since the time T(|V|, |E|) needed to compute the matrix grows ex- ponentially as a function of the grap… ▽ More

    Submitted 13 August, 2013; v1 submitted 27 May, 2013; originally announced May 2013.