Skip to main content

Showing 1–16 of 16 results for author: Kurtić, E

.
  1. arXiv:2406.12572  [pdf, other

    cs.CL cs.AI cs.LG

    Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

    Authors: Eldar Kurtic, Amir Moeini, Dan Alistarh

    Abstract: We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across l… ▽ More

    Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    ACM Class: I.2.7

  2. arXiv:2405.15593  [pdf, other

    cs.LG math.NA

    MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

    Authors: Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic, Thomas Robert, Peter Richtarik, Dan Alistarh

    Abstract: We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called MICROADAM that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. We achieve this by compressing the gradient information before it is fed into the optimizer state, thereby reducing its memory footprint significantly. We control the resulting compression error via a novel instanc… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.03594  [pdf, other

    cs.CL cs.AI

    Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

    Authors: Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity. We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning me… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2312.13547  [pdf, other

    cs.CL

    How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark

    Authors: Eldar Kurtic, Torsten Hoefler, Dan Alistarh

    Abstract: Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent ``Sparsity May Cry'' (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruni… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted as oral to CPAL 2024

  5. arXiv:2310.06927  [pdf, other

    cs.CL cs.AI

    Sparse Fine-tuning for Inference Acceleration of Large Language Models

    Authors: Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

    Abstract: We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based fine-tuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determ… ▽ More

    Submitted 13 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2308.02060  [pdf, other

    cs.LG cs.AI

    Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

    Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

    Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and mo… ▽ More

    Submitted 8 September, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  7. arXiv:2306.06098  [pdf, other

    cs.LG math.NA math.OC

    Error Feedback Can Accurately Compress Preconditioners

    Authors: Ionut-Vlad Modoranu, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: Leveraging second-order information about the loss at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scal… ▽ More

    Submitted 5 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

  8. arXiv:2303.14409  [pdf, other

    cs.CV

    Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

    Authors: Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

    Abstract: Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to mak… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    MSC Class: 68T07 ACM Class: I.m

  9. arXiv:2302.04852  [pdf, other

    cs.LG

    SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

    Authors: Mahdi Nikdan, Tommaso Pegolotti, Eugenia Iofinova, Eldar Kurtic, Dan Alistarh

    Abstract: We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  10. arXiv:2302.04089  [pdf, other

    cs.LG cs.CL

    ZipLM: Inference-Aware Structured Pruning of Language Models

    Authors: Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. ZipLM achieves state-of-the-art accuracy-vs-speedup, while matching a set of desired target runtime speedups in any given inference env… ▽ More

    Submitted 26 October, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted to NeurIPS 2023

  11. arXiv:2210.09223  [pdf, other

    cs.CV cs.LG

    CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

    Authors: Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Prune… ▽ More

    Submitted 31 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    MSC Class: 68T07 ACM Class: I.m

  12. arXiv:2210.06384  [pdf, other

    cs.CL

    GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most BERT-Pruning Methods

    Authors: Eldar Kurtic, Dan Alistarh

    Abstract: We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models, focusing on the classic BERT benchmark on various popular tasks. Despite existing evidence in the literature that GMP performs poorly, we show that a simple and general variant, which we call GMP*, can match and sometimes outperform more complex state-of-the-art methods. Our results provid… ▽ More

    Submitted 8 December, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

  13. arXiv:2209.03990  [pdf, other

    cs.AI

    Vision for Bosnia and Herzegovina in Artificial Intelligence Age: Global Trends, Potential Opportunities, Selected Use-cases and Realistic Goals

    Authors: Zlatan Ajanović, Emina Aličković, Aida Branković, Sead Delalić, Eldar Kurtić, Salem Malikić, Adnan Mehonić, Hamza Merzić, Kenan Šehić, Bahrudin Trbalić

    Abstract: Artificial Intelligence (AI) is one of the most promising technologies of the 21. century, with an already noticeable impact on society and the economy. With this work, we provide a short overview of global trends, applications in industry and selected use-cases from our international experience and work in industry and academia. The goal is to present global and regional positive practices and pr… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 25 pages, 3 figures, Bosnian language. Presented at Naucno-strucna konferencija o umjetnoj inteligenciji. Federalno ministarstvo obrazovanja i nauke, Mostar, Bosna i Hercegovina, April 2022

  14. arXiv:2207.14200  [pdf, other

    cs.LG

    CrAM: A Compression-Aware Minimizer

    Authors: Alexandra Peste, Adrian Vladu, Eldar Kurtic, Christoph H. Lampert, Dan Alistarh

    Abstract: Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trai… ▽ More

    Submitted 4 May, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted to ICLR 2023

  15. arXiv:2203.07259  [pdf, other

    cs.CL cs.LG

    The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

    Authors: Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh

    Abstract: Transformer-based language models have become a key building block for natural language processing. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of compression methods, including distillation, quantization, structured and unstructured pruning are known to decrease model size and increase inference speed, wi… ▽ More

    Submitted 17 October, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Accepted to EMNLP 2022

  16. arXiv:2107.03356  [pdf, other

    cs.LG

    M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

    Authors: Elias Frantar, Eldar Kurtic, Dan Alistarh

    Abstract: Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Produ… ▽ More

    Submitted 18 November, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted to NeurIPS 2021