Search | arXiv e-print repository

Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

Authors: Francisco Muñoz-Martínez, Raveesh Garg, José L. Abellán, Michael Pellauer, Manuel E. Acacio, Tushar Krishna

Abstract: Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (i.e., Inner Product, Outer Product or Gustavsons), that determines their overall efficiency. We demonstrate that this static decision inherently results in a suboptimal dynamic solution. This is because different SpMSpM kernels show vary… ▽ More Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse Matrix Multiplication (SpMSpM) accelerators are tailored to a particular SpMSpM dataflow (i.e., Inner Product, Outer Product or Gustavsons), that determines their overall efficiency. We demonstrate that this static decision inherently results in a suboptimal dynamic solution. This is because different SpMSpM kernels show varying features (i.e., dimensions, sparsity pattern, sparsity degree), which makes each dataflow better suited to different data sets. In this work we present Flexagon, the first SpMSpM reconfigurable accelerator that is capable of performing SpMSpM computation by using the particular dataflow that best matches each case. Flexagon accelerator is based on a novel Merger-Reduction Network (MRN) that unifies the concept of reducing and merging in the same substrate, increasing efficiency. Additionally, Flexagon also includes a 3-tier memory hierarchy, specifically tailored to the different access characteristics of the input and output compressed matrices. Using detailed cycle-level simulation of contemporary DNN models from a variety of application domains, we show that Flexagon achieves average performance benefits of 4.59x, 1.71x, and 1.35x with respect to the state-of-the-art SIGMA-like, Sparch-like and GAMMA-like accelerators (265% , 67% and 18%, respectively, in terms of average performance/area efficiency). △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: To appear on ASPLOS 2023

arXiv:2103.07977 [pdf, other]

Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

Authors: Raveesh Garg, Eric Qin, Francisco Muñoz-Martínez, Robert Guirado, Akshay Jain, Sergi Abadal, José L. Abellán, Manuel E. Acacio, Eduard Alarcón, Sivasankaran Rajamanickam, Tushar Krishna

Abstract: Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelera… ▽ More Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelerators offers promise for acceleration by map** optimized dataflows (i.e., computation order and parallelism) for both phases. The goal of this work is to characterize and understand the design-space of dataflow choices for running GNNs on spatial accelerators in order for mappers or design-space exploration tools to optimize the dataflow based on the workload. Specifically, we propose a taxonomy to describe all possible choices for map** the dense and sparse phases of GNN inference, spatially and temporally over a spatial accelerator, capturing both the intra-phase dataflow and the inter-phase (pipelined) dataflow. Using this taxonomy, we do deep-dives into the cost and benefits of several dataflows and perform case studies on implications of hardware parameters for dataflows and value of flexibility to support pipelined execution. △ Less

Submitted 6 March, 2022; v1 submitted 14 March, 2021; originally announced March 2021.

Comments: Accepted for publication at the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)

arXiv:2006.07137 [pdf, other]

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators

Authors: Francisco Muñoz-Martínez, José L. Abellán, Manuel E. Acacio, Tushar Krishna

Abstract: The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. First-generation rigid proposals have been rapidly replaced by more advanced flexible accelerator architectures able to efficiently support a variety of layer types and dimensions. As the complexity of the designs grows, it is more and more appeali… ▽ More The design of specialized architectures for accelerating the inference procedure of Deep Neural Networks (DNNs) is a booming area of research nowadays. First-generation rigid proposals have been rapidly replaced by more advanced flexible accelerator architectures able to efficiently support a variety of layer types and dimensions. As the complexity of the designs grows, it is more and more appealing for researchers to have cycle-accurate simulation tools at their disposal to allow for fast and accurate design-space exploration, and rapid quantification of the efficacy of architectural enhancements during the early stages of a design. To this end, we present STONNE (Simulation TOol of Neural Network Engines), a cycle-accurate, highly-modular and highly-extensible simulation framework that enables end-to-end evaluation of flexible accelerator architectures running complete contemporary DNN models. We use STONNE to model the recently proposed MAERI architecture and show how it can closely approach the performance results of the publicly available BSV-coded MAERI implementation. Then, we conduct a comprehensive evaluation and demonstrate that the folding strategy implemented for MAERI results in very low compute unit utilization (25% on average across 5 DNN models) which in the end translates into poor performance. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:1809.07787 [pdf, other]

doi 10.1016/j.optcom.2019.01.039

Fock-state superradiance in a cold atomic ensemble

Authors: D. F. Barros, L. F. Muñoz-Martínez, L. Ortiz-Gutiérrez, C. A. E. Guerra, J. E. O. Morales, R. S. N. Moreira, N. D. Alves, A. F. G. Tieco, D. Felinto, P. L. Saldanha

Abstract: A simplified theory for the wavepackets of the photons emitted during the read process of a quantum memory formed by cold atoms is provided. We arrive at analytical expressions for the single- and double-photon emissions, evidencing superradiant features in both cases. In the two-photon case, both photons are emitted in the same spatiotemporal mode, characterizing a superradiant emission of a Fock… ▽ More A simplified theory for the wavepackets of the photons emitted during the read process of a quantum memory formed by cold atoms is provided. We arrive at analytical expressions for the single- and double-photon emissions, evidencing superradiant features in both cases. In the two-photon case, both photons are emitted in the same spatiotemporal mode, characterizing a superradiant emission of a Fock state of light with two excitations. Experiments confirm the theoretical predictions with a satisfactory agreement. △ Less

Submitted 28 March, 2019; v1 submitted 20 September, 2018; originally announced September 2018.

Comments: 12 pages, 7 figures

Journal ref: Opt. Comm. 443, 34 (2019)

arXiv:1710.02905 [pdf, other]

doi 10.1103/PhysRevA.98.023823

Exploring six modes of an optical parametric oscillator

Authors: Luis F. Muñoz-Martínez, Felippe Alexandre Silva Barbosa, Antônio Sales Coelho, Luis Ortiz-Gutiérrez, Marcelo Martinelli, Paulo Nussenzveig, Alessandro S. Villar

Abstract: We measure the complete quantum state for six modes of the electromagnetic field produced by an optical parametric oscillator. The investigation involves the sideband of the intense pump, signal, and idler fields generated by stimulated parametric downconversion inside a triply resonant optical resonator. We develop a theoretical model to successfully interpret the experimental results. The model… ▽ More We measure the complete quantum state for six modes of the electromagnetic field produced by an optical parametric oscillator. The investigation involves the sideband of the intense pump, signal, and idler fields generated by stimulated parametric downconversion inside a triply resonant optical resonator. We develop a theoretical model to successfully interpret the experimental results. The model takes into account the coupling of the field modes to the phonon bath of the nonlinear crystal, clearly showing the roles of different physical effects in sha** the structure of the quantum correlations between the six optical modes. △ Less

Submitted 20 February, 2018; v1 submitted 8 October, 2017; originally announced October 2017.

Comments: 11 pages, 5 figures

Journal ref: Phys. Rev. A 98, 023823 (2018)

arXiv:1312.5631 [pdf, ps, other]

GRB 130606A within a sub-DLA at redshift 5.91

Authors: A. J. Castro-Tirado, R. Sánchez-Ramírez, S. L. Ellison, M. Jelínek, A. Martín-Carrillo, V. Bromm, J. Gorosabel, M. Bremer, J. M. Winters, L. Hanlon, S. Meegan, M. Topinka, S. B. Pandey, S. Guziy, S. Jeong, E. Sonbas, A. S. Pozanenko, R. Cunniffe, R. Fernández-Muñoz, P. Ferrero, N. Gehrels, R. Hudec, P. Kubánek, O. Lara-Gil, V. F. Muñoz-Martínez , et al. (16 additional authors not shown)

Abstract: Events such as GRB130606A at z=5.91, offer an exciting new window into pre-galactic metal enrichment in these very high redshift host galaxies. We study the environment and host galaxy of GRB 130606A, a high-z event, in the context of a high redshift population of GRBs. We have obtained multiwavelength observations from radio to gamma-ray, concentrating particularly on the X-ray evolution as well… ▽ More Events such as GRB130606A at z=5.91, offer an exciting new window into pre-galactic metal enrichment in these very high redshift host galaxies. We study the environment and host galaxy of GRB 130606A, a high-z event, in the context of a high redshift population of GRBs. We have obtained multiwavelength observations from radio to gamma-ray, concentrating particularly on the X-ray evolution as well as the optical photometric and spectroscopic data analysis. With an initial Lorentz bulk factor in the range Γ_0 ~ 65-220, the X-ray afterglow evolution can be explained by a time-dependent photoionization of the local circumburst medium, within a compact and dense environment. The host galaxy is a sub-DLA (log N (HI) = 19.85+/-0.15), with a metallicity content in the range from ~1/7 to ~1/60 of solar. Highly ionized species (N V and Si IV) are also detected. This is the second highest redshift burst with a measured GRB-DLA metallicity and only the third GRB absorber with sub-DLA HI column density. GRB ' lighthouses' therefore offer enormous potential as backlighting sources to probe the ionization and metal enrichment state of the IGM at very high redshifts for the chemical signature of the first generation of stars. △ Less

Submitted 20 December, 2013; v1 submitted 19 December, 2013; originally announced December 2013.

Comments: 10 pages, 9 figures, submitted to A&A. Typos corrected

Showing 1–6 of 6 results for author: Muñoz-Martínez, F