Skip to main content

Showing 1–4 of 4 results for author: Lascorz, A D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2201.08830  [pdf, other

    cs.AR cs.LG

    APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

    Authors: Alberto Delmas Lascorz, Mostafa Mahmoud, Andreas Moshovos

    Abstract: Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present APack, a simple and effective, lossless, off-chip memory compression technique for fixed-point quantized models. APack reduces data widths by exploiting the non-uniform value distribution in deep learning applications. APack can be used… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  2. arXiv:2002.03090  [pdf, other

    cs.LG stat.ML

    BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

    Authors: Miloš Nikolić, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon, Andreas Moshovos

    Abstract: Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity whi… ▽ More

    Submitted 11 August, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  3. arXiv:1805.04513  [pdf, ps, other

    cs.NE cs.AR cs.LG

    Laconic Deep Learning Computing

    Authors: Sayeh Sharify, Mostafa Mahmoud, Alberto Delmas Lascorz, Milos Nikolic, Andreas Moshovos

    Abstract: We motivate a method for transparently identifying ineffectual computations in unmodified Deep Learning models and without affecting accuracy. Specifically, we show that if we decompose multiplications down to the bit level the amount of work performed during inference for image classification models can be consistently reduced by two orders of magnitude. In the best case studied of a sparse varia… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

  4. arXiv:1706.07853  [pdf, ps, other

    cs.DC cs.AR cs.LG

    Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

    Authors: Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, Andreas Moshovos

    Abstract: Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved translates to proportional performance gains. Specifically, for convolutional layers LM's execution time scales inversely proportionally with the precisions of both weights and activations. For fully-connected layers LM's performance scales inversel… ▽ More

    Submitted 16 May, 2018; v1 submitted 23 June, 2017; originally announced June 2017.