Skip to main content

Showing 1–3 of 3 results for author: Horowitz, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.09552  [pdf, other

    cs.AR

    Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

    Authors: Song Han, Xingyu Liu, Huizi Mao, **g Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

    Abstract: EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI c… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Invited retrospective paper at ISCA 2023

  2. Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era

    Authors: Ardavan Pedram, Stephen Richardson, Sameh Galal, Shahar Kvatinsky, Mark A. Horowitz

    Abstract: The key challenge to improving performance in the age of Dark Silicon is how to leverage transistors when they cannot all be used at the same time. In modern SOCs, these transistors are often used to create specialized accelerators which improve energy efficiency for some applications by 10-1000X. While this might seem like the magic bullet we need, for most CPU applications more energy is dissipa… ▽ More

    Submitted 26 April, 2016; v1 submitted 12 February, 2016; originally announced February 2016.

    Comments: 8 pages, To appear in IEEE Design and Test Journal

    Journal ref: IEEE Design & Test ( Volume: 34, Issue: 2, April 2017 )

  3. arXiv:1602.01528  [pdf, other

    cs.CV cs.AR

    EIE: Efficient Inference Engine on Compressed Deep Neural Network

    Authors: Song Han, Xingyu Liu, Huizi Mao, **g Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

    Abstract: State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the require… ▽ More

    Submitted 3 May, 2016; v1 submitted 3 February, 2016; originally announced February 2016.

    Comments: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly: https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision: http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University: https://goo.gl/6lwuer. Published as a conference paper in ISCA 2016