Skip to main content

Showing 1–6 of 6 results for author: Narayanaswami, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2109.14320  [pdf, other

    cs.AR cs.LG

    Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: Emerging edge computing platforms often contain machine learning (ML) accelerators that can accelerate inference for a wide range of neural network (NN) models. These models are designed to fit within the limited area and energy constraints of the edge computing platforms, each targeting various applications (e.g., face detection, speech recognition, translation, image captioning, video analytics)… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: This work appears at the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT 2021). arXiv admin note: substantial text overlap with arXiv:2103.00768

  2. arXiv:2103.00768  [pdf, other

    cs.AR cs.LG

    Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: As the need for edge computing grows, many modern consumer devices now contain edge machine learning (ML) accelerators that can compute a wide range of neural network (NN) models while still fitting within tight resource constraints. We analyze a commercial Edge TPU using 24 Google edge NN models (including CNNs, LSTMs, transducers, and RCNNs), and find that the accelerator suffers from three shor… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  3. arXiv:2102.10423  [pdf, other

    cs.LG cs.AR

    An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

    Authors: Kiran Seshadri, Berkin Akin, James Laudon, Ravi Narayanaswami, Amir Yazdanbakhsh

    Abstract: Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the produc… ▽ More

    Submitted 11 October, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: 13 pages, 15 figures, 8 tables, published in IISWC 2022

  4. arXiv:2102.08619  [pdf, other

    cs.LG cs.AR

    Rethinking Co-design of Neural Architectures and Hardware Accelerators

    Authors: Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng, Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon

    Abstract: Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed hardware. And the dominant hardware architecture explored in this prior work is FPGAs. In our work, we target the optimization of hardware and software configurations o… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  5. arXiv:2102.01723  [pdf, other

    cs.LG cs.AR

    Apollo: Transferable Architecture Exploration

    Authors: Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon

    Abstract: The looming end of Moore's Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimization problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelera… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 10 pages, 5 figures, Accepted to Workshop on ML for Systems at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  6. arXiv:1704.04760  [pdf

    cs.AR cs.LG cs.NE

    In-Datacenter Performance Analysis of a Tensor Processing Unit

    Authors: Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg , et al. (50 additional authors not shown)

    Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp… ▽ More

    Submitted 16 April, 2017; originally announced April 2017.

    Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017