Skip to main content

Showing 1–6 of 6 results for author: Shomron, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2105.11010  [pdf, other

    cs.LG cs.AR cs.CV

    Post-Training Sparsity-Aware Quantization

    Authors: Gil Shomron, Freddy Gabbay, Samer Kurzum, Uri Weiser

    Abstract: Quantization is a technique used in deep neural networks (DNNs) to increase execution performance and hardware efficiency. Uniform post-training quantization (PTQ) methods are common, since they can be implemented efficiently in hardware and do not require extensive hardware resources or a training set. Map** FP32 models to INT8 using uniform PTQ yields models with negligible accuracy degradatio… ▽ More

    Submitted 28 October, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

  2. arXiv:2010.05625  [pdf, ps, other

    cs.LG

    Post-Training BatchNorm Recalibration

    Authors: Gil Shomron, Uri Weiser

    Abstract: We revisit non-blocking simultaneous multithreading (NB-SMT) introduced previously by Shomron and Weiser (2020). NB-SMT trades accuracy for performance by occasionally "squeezing" more than one thread into a shared multiply-and-accumulate (MAC) unit. However, the method of accommodating more than one thread in a shared MAC unit may contribute noise to the computations, thereby changing the interna… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  3. arXiv:2004.09309  [pdf, other

    cs.LG cs.AR cs.CV eess.SP

    Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks

    Authors: Gil Shomron, Uri Weiser

    Abstract: Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hardware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of zero-valued bits that may cause inefficiencies when executed on hardware. Inspired by conventional CPU simultaneous multithreading (SMT) that increases computer… ▽ More

    Submitted 17 September, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: MICRO-53

  4. arXiv:2002.07686  [pdf, other

    cs.LG cs.CV stat.ML

    Robust Quantization: One Model to Rule Them All

    Authors: Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser

    Abstract: Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the qua… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  5. arXiv:1909.07636  [pdf, other

    cs.CV

    Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks

    Authors: Gil Shomron, Ron Banner, Moran Shkolnik, Uri Weiser

    Abstract: Convolutional neural networks (CNNs) introduce state-of-the-art results for various tasks with the price of high computational demands. Inspired by the observation that spatial correlation exists in CNN output feature maps (ofms), we propose a method to dynamically predict whether ofm activations are zero-valued or not according to their neighboring activation values, thereby avoiding zero-valued… ▽ More

    Submitted 13 July, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

  6. Spatial Correlation and Value Prediction in Convolutional Neural Networks

    Authors: Gil Shomron, Uri Weiser

    Abstract: Convolutional neural networks (CNNs) are a widely used form of deep neural networks, introducing state-of-the-art results for different problems such as image classification, computer vision tasks, and speech recognition. However, CNNs are compute intensive, requiring billions of multiply-accumulate (MAC) operations per input. To reduce the number of MACs in CNNs, we propose a value prediction met… ▽ More

    Submitted 1 January, 2019; v1 submitted 21 July, 2018; originally announced July 2018.

    Comments: This paper has been accepted to IEEE Computer Architecture Letters (https://ieeexplore.ieee.org/document/8594568)