Skip to main content

Showing 1–6 of 6 results for author: Lew, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05171  [pdf, other

    cs.LG

    Custom Gradient Estimators are Straight-Through Estimators in Disguise

    Authors: Matt Schoenbauer, Daniele Moro, Lukasz Lew, Andrew Howard

    Abstract: Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions have been proposed to address this issue. In this paper, we prove that when the learning rate is sufficiently small, a large class of weight gradient estimators… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2404.00103  [pdf, other

    cs.LG cs.CV

    PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

    Authors: Marina Neseem, Conor McCullough, Randy Hsin, Chas Leichner, Shan Li, In Suk Chong, Andrew G. Howard, Lukasz Lew, Sherief Reda, Ville-Mikko Rautio, Daniele Moro

    Abstract: Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024. 10 Figures, 9 Tables

  3. arXiv:2302.04907  [pdf, other

    cs.CL cs.LG

    Binarized Neural Machine Translation

    Authors: Yichi Zhang, Ankush Garg, Yuan Cao, Ɓukasz Lew, Behrooz Ghorbani, Zhiru Zhang, Orhan Firat

    Abstract: The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residu… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Journal ref: Published at NeurIPS 2023

  4. arXiv:2203.15952  [pdf, other

    eess.AS cs.LG

    4-bit Conformer with Native Quantization Aware Training for Speech Recognition

    Authors: Shao** Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov

    Abstract: Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compr… ▽ More

    Submitted 2 March, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Published at INTERSPEECH 2022

  5. arXiv:2112.00133  [pdf, other

    cs.LG cs.CV

    PokeBNN: A Binary Pursuit of Lightweight Accuracy

    Authors: Yichi Zhang, Zhiru Zhang, Lukasz Lew

    Abstract: Optimization of Top-1 ImageNet promotes enormous networks that may be impractical in inference settings. Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual p… ▽ More

    Submitted 28 April, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022

  6. Pareto-Optimal Quantized ResNet Is Mostly 4-bit

    Authors: AmirAli Abdolrashidi, Lisa Wang, Shivani Agrawal, Jonathan Malmaud, Oleg Rybakov, Chas Leichner, Lukasz Lew

    Abstract: Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to s… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

    Comments: 8 pages. Accepted at the Efficient Deep Learning for Computer Vision Workshop at CVPR 2021