Skip to main content

Showing 1–10 of 10 results for author: Judd, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.05433  [pdf, other

    cs.LG

    FP8 Formats for Deep Learning

    Authors: Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Heinecke, Patrick Judd, John Kamalu, Naveen Mellempudi, Stuart Oberman, Mohammad Shoeybi, Michael Siu, Hao Wu

    Abstract: FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special… ▽ More

    Submitted 29 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

  2. arXiv:2004.09602  [pdf, other

    cs.LG stat.ML

    Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

    Authors: Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, Paulius Micikevicius

    Abstract: Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of quantization parameters and evaluate their choices on a wide range of neural network models for different application domains, including vision, speech, and language. We focu… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: 20 pages, 7 figures

  3. arXiv:1804.06732  [pdf, other

    cs.NE

    DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing

    Authors: Alberto Delmas, Sayeh Sharify, Patrick Judd, Kevin Siu, Milos Nikolic, Andreas Moshovos

    Abstract: We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst case design. Much shorter data types can be used if we target the common case by adjusting the precision at a much finer granularity. We propose Dynamic Precision Reduction (DPRed), where we group weights and activations and encode them using… ▽ More

    Submitted 17 December, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

  4. arXiv:1803.03688  [pdf, other

    cs.NE

    Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

    Authors: Alberto Delmas, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Andreas Moshovos

    Abstract: We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those weights and activations that are zero, we target different combinations of value stream properties. We demonstrate a practical application with Bit-Tactical (TCL), a hardware accelerator which exploits weight sparsity, per layer precision varia… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

    Comments: An earlier version of this work titled "JaZ: Enabling Innovation Towards Chaff-Free Deep Learning Computing" was submitted for blind review

  5. arXiv:1707.09068  [pdf, other

    cs.NE

    Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

    Authors: Alberto Delmas, Sayeh Sharify, Patrick Judd, Andreas Moshovos

    Abstract: Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits the variable per layer precision requirements of DNNs to deliver execution time that is proportional to the precision p in bits used per layer for convolutional and fully-connected layers. Prior art has demonstrated an accelerator with the s… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

  6. arXiv:1706.07853  [pdf, ps, other

    cs.DC cs.AR cs.LG

    Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

    Authors: Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, Andreas Moshovos

    Abstract: Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved translates to proportional performance gains. Specifically, for convolutional layers LM's execution time scales inversely proportionally with the precisions of both weights and activations. For fully-connected layers LM's performance scales inversel… ▽ More

    Submitted 16 May, 2018; v1 submitted 23 June, 2017; originally announced June 2017.

  7. arXiv:1706.00504  [pdf, other

    cs.NE cs.LG

    Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

    Authors: Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos

    Abstract: Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of the activation values. The fixed-point precisions are determined a priori using profiling and are selected at a per layer granularity. This paper presents Dynamic Stripes, an extension to Stripes that detects precision variance at runtime and… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Comments: 3 pages, 3 figures

  8. arXiv:1705.00125  [pdf, other

    cs.LG

    Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing

    Authors: Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos

    Abstract: We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep Learning Network. We first describe different encodings of the activations that are deemed ineffectual. The encodings have different memory overhead and energy characteristics. We propose using a level of indirection when accessing activations f… ▽ More

    Submitted 28 April, 2017; originally announced May 2017.

    Comments: 6 pages, 5 figures

  9. arXiv:1610.06920  [pdf, other

    cs.LG cs.AI cs.AR cs.CV

    Bit-pragmatic Deep Neural Network Computing

    Authors: J. Albericio, P. Judd, A. Delmás, S. Sharify, A. Moshovos

    Abstract: We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms,… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

  10. arXiv:1511.05236  [pdf, other

    cs.LG cs.NE

    Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets

    Authors: Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, Andreas Moshovos

    Abstract: This work investigates how using reduced precision data in Convolutional Neural Networks (CNNs) affects network accuracy during classification. More specifically, this study considers networks where each layer may use different precision data. Our key result is the observation that the tolerance of CNNs to reduced precision data not only varies across networks, a well established observation, but… ▽ More

    Submitted 8 January, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016, 12 pages, 5 figures