Skip to main content

Showing 1–12 of 12 results for author: Weiser, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2105.11010  [pdf, other

    cs.LG cs.AR cs.CV

    Post-Training Sparsity-Aware Quantization

    Authors: Gil Shomron, Freddy Gabbay, Samer Kurzum, Uri Weiser

    Abstract: Quantization is a technique used in deep neural networks (DNNs) to increase execution performance and hardware efficiency. Uniform post-training quantization (PTQ) methods are common, since they can be implemented efficiently in hardware and do not require extensive hardware resources or a training set. Map** FP32 models to INT8 using uniform PTQ yields models with negligible accuracy degradatio… ▽ More

    Submitted 28 October, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

  2. arXiv:2010.05625  [pdf, ps, other

    cs.LG

    Post-Training BatchNorm Recalibration

    Authors: Gil Shomron, Uri Weiser

    Abstract: We revisit non-blocking simultaneous multithreading (NB-SMT) introduced previously by Shomron and Weiser (2020). NB-SMT trades accuracy for performance by occasionally "squeezing" more than one thread into a shared multiply-and-accumulate (MAC) unit. However, the method of accommodating more than one thread in a shared MAC unit may contribute noise to the computations, thereby changing the interna… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  3. arXiv:2005.06102  [pdf, other

    cs.DC

    Semantic prefetching using forecast slices

    Authors: Leeor Peled, Uri Weiser, Yoav Etsion

    Abstract: Modern prefetchers identify memory access patterns in order to predict future accesses. However, many applications exhibit irregular access patterns that do not manifest spatio-temporal locality in the memory address space. Such applications usually do not fall under the scope of existing prefetching techniques, which observe only the stream of addresses dispatched by the memory unit but not the c… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: Under conference review

    ACM Class: C.1

  4. arXiv:2004.09309  [pdf, other

    cs.LG cs.AR cs.CV eess.SP

    Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks

    Authors: Gil Shomron, Uri Weiser

    Abstract: Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hardware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of zero-valued bits that may cause inefficiencies when executed on hardware. Inspired by conventional CPU simultaneous multithreading (SMT) that increases computer… ▽ More

    Submitted 17 September, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: MICRO-53

  5. arXiv:2002.07686  [pdf, other

    cs.LG cs.CV stat.ML

    Robust Quantization: One Model to Rule Them All

    Authors: Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser

    Abstract: Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the qua… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  6. arXiv:1909.07636  [pdf, other

    cs.CV

    Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks

    Authors: Gil Shomron, Ron Banner, Moran Shkolnik, Uri Weiser

    Abstract: Convolutional neural networks (CNNs) introduce state-of-the-art results for various tasks with the price of high computational demands. Inspired by the observation that spatial correlation exists in CNN output feature maps (ofms), we propose a method to dynamically predict whether ofm activations are zero-valued or not according to their neighboring activation values, thereby avoiding zero-valued… ▽ More

    Submitted 13 July, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

  7. Spatial Correlation and Value Prediction in Convolutional Neural Networks

    Authors: Gil Shomron, Uri Weiser

    Abstract: Convolutional neural networks (CNNs) are a widely used form of deep neural networks, introducing state-of-the-art results for different problems such as image classification, computer vision tasks, and speech recognition. However, CNNs are compute intensive, requiring billions of multiply-accumulate (MAC) operations per input. To reduce the number of MACs in CNNs, we propose a value prediction met… ▽ More

    Submitted 1 January, 2019; v1 submitted 21 July, 2018; originally announced July 2018.

    Comments: This paper has been accepted to IEEE Computer Architecture Letters (https://ieeexplore.ieee.org/document/8594568)

  8. A neural network memory prefetcher using semantic locality

    Authors: Leeor Peled, Uri Weiser, Yoav Etsion

    Abstract: Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by matching memory addresses that are accessed in close proximity (either in space or time), the recently proposed concept of semantic locality views locality as a… ▽ More

    Submitted 26 July, 2018; v1 submitted 19 March, 2018; originally announced April 2018.

    Comments: Newer version under conference submission

    Journal ref: ACM Trans. Archit. Code Optim., Vol. 16, No. 4, Article 37, Publication date: October 2019

  9. arXiv:1705.06923  [pdf

    cs.AR

    MultiAmdahl: Optimal Resource Allocation in Heterogeneous Architectures

    Authors: Leonid Yavits, Amir Morad, Uri Weiser, Ran Ginosar

    Abstract: Future multiprocessor chips will integrate many different units, each tailored to a specific computation. When designing such a system, the chip architect must decide how to distribute limited system resources such as area, power, and energy among the computational units. We extend MultiAmdahl, an analytical optimization technique for resource allocation in heterogeneous architectures, for energy… ▽ More

    Submitted 19 May, 2017; originally announced May 2017.

  10. A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

    Authors: Roman Kaplan, Leonid Yavits, Ran Ginosar, Uri Weiser

    Abstract: A novel processing-in-storage (PRinS) architecture based on Resistive CAM (ReCAM) is described and proposed for Smith-Waterman (S-W) sequence alignment. The ReCAM massively-parallel compare operation finds matching base-pairs in a fixed number of cycles, regardless of sequence length. The ReCAM PRinS S-W algorithm is simulated and compared to FPGA, Xeon Phi and GPU-based implementations, showing a… ▽ More

    Submitted 11 June, 2017; v1 submitted 17 January, 2017; originally announced January 2017.

    Journal ref: IEEE Micro, vol. 37, no. 4, pp. 20-28, 2017

  11. arXiv:1601.07815  [pdf

    cs.DC cs.PF

    Convex Optimization of Real Time SoC

    Authors: L. Yavits, A. Morad, R. Ginosar, U. Weiser

    Abstract: Convex optimization methods are employed to optimize a real-time (RT) system-on-chip (SoC) under a variety of physical resource-driven constraints, demonstrated on an industry MPEG2 encoder SoC. The power optimization is compared to conventional performance-optimization framework, showing a factor of two and a half saving in power. Convex optimization is shown to be very efficient in a high-level… ▽ More

    Submitted 19 May, 2017; v1 submitted 28 January, 2016; originally announced January 2016.

    Comments: 6 pages, 3 figures

  12. arXiv:1105.2960  [pdf, ps, other

    cs.AR

    Multi-Amdahl: Optimal Resource Sharing with Multiple Program Execution Segments

    Authors: Tsahee Zidenberg, Isaac Keslassy, Uri Weiser

    Abstract: This paper presents Multi-Amdahl, a resource allocation analytical tool for heterogeneous systems. Our model includes multiple program execution segments, where each one is accelerated by a specific hardware unit. The acceleration speedup of the specific hardware unit is a function of a limited resource, such as the unit area, power, or energy. Using the Lagrange theorem we discover the optimal re… ▽ More

    Submitted 15 May, 2011; originally announced May 2011.

    Comments: Technical Report

    Report number: TR11-03