Skip to main content

Showing 1–5 of 5 results for author: Bondarenko, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06385  [pdf, other

    cs.LG cs.AI cs.CL

    Low-Rank Quantization-Aware Training for LLMs

    Authors: Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel

    Abstract: Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and memory efficient. Quantization-aware training (QAT) methods, generally produce the best quantized performance, however it comes at the cost of potentially long trai… ▽ More

    Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2306.12929  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Quantizable Transformers: Removing Outliers by Hel** Attention Heads Do Nothing

    Authors: Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort

    Abstract: Transformer models have been widely adopted in various domains over the last years, and especially large language models have advanced the field of AI significantly. Due to their size, the capability of these networks has increased tremendously, but this has come at the cost of a significant increase in necessary compute. Quantization is one of the most effective ways to reduce the computational t… ▽ More

    Submitted 9 November, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  3. arXiv:2203.11086  [pdf, other

    cs.LG

    Overcoming Oscillations in Quantization-Aware Training

    Authors: Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort

    Abstract: When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training (QAT) are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a sign… ▽ More

    Submitted 28 June, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Published as oral paper at ICML 2022

  4. arXiv:2109.12948  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding and Overcoming the Challenges of Efficient Transformer Quantization

    Authors: Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort

    Abstract: Transformer-based architectures have become the de-facto standard models for a wide range of Natural Language Processing tasks. However, their memory footprint and high latency are prohibitive for efficient deployment and inference on resource-limited devices. In this work, we explore quantization for transformers. We show that transformers have unique quantization challenges -- namely, high dynam… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  5. arXiv:2106.08295  [pdf, other

    cs.LG cs.AI cs.CV

    A White Paper on Neural Network Quantization

    Authors: Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort

    Abstract: While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.