Skip to main content

Showing 1–13 of 13 results for author: Kuznedelev, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17261  [pdf, other

    eess.IV cs.CV

    Does Diffusion Beat GAN in Image Super Resolution?

    Authors: Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, Sergey Kastryulin

    Abstract: There is a prevalent opinion in the recent literature that Diffusion-based models outperform GAN-based counterparts on the Image Super Resolution (ISR) problem. However, in most studies, Diffusion-based ISR models were trained longer and utilized larger networks than the GAN baselines. This raises the question of whether the superiority of Diffusion models is due to the Diffusion paradigm being be… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.14852  [pdf, other

    cs.LG

    PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

    Authors: Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik

    Abstract: There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices. Existing work focused on improved one-shot quantization techniques and weight representations; yet, purely post-training approaches are reaching diminishing returns in terms of the accurac… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  3. arXiv:2404.05666  [pdf, other

    cs.CV

    YaART: Yet Another ART Rendering Technology

    Authors: Sergey Kastryulin, Artem Konev, Alexander Shishenya, Eugene Lyapustin, Artem Khurshudov, Alexander Tselousov, Nikita Vinokurov, Denis Kuznedelev, Alexander Markovich, Grigoriy Livshits, Alexey Kirillov, Anastasiia Tabisheva, Liubov Chubarova, Marina Kaminskaia, Alexander Ustyuzhanin, Artemii Shvetsov, Daniil Shlenskii, Valerii Startsev, Dmitrii Kornilov, Mikhail Romanov, Artem Babenko, Sergei Ovcharenko, Valentin Khrulkov

    Abstract: In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

  4. arXiv:2401.06118  [pdf, other

    cs.LG cs.CL

    Extreme Compression of Large Language Models via Additive Quantization

    Authors: Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh

    Abstract: The emergence of accurate open large language models (LLMs) has led to a race towards performant quantization techniques which can enable their execution on end-user devices. In this paper, we revisit the problem of ``extreme'' LLM compression -- defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter -- from the point of view of classic methods in Multi-Codebook Quantizat… ▽ More

    Submitted 8 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: ICML, 2024

  5. arXiv:2310.06927  [pdf, other

    cs.CL cs.AI

    Sparse Fine-tuning for Inference Acceleration of Large Language Models

    Authors: Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

    Abstract: We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based fine-tuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determ… ▽ More

    Submitted 13 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2308.02060  [pdf, other

    cs.LG cs.AI

    Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

    Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

    Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and mo… ▽ More

    Submitted 8 September, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  7. arXiv:2306.03078  [pdf, other

    cs.CL cs.LG

    SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

    Authors: Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh

    Abstract: Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy losses, especiall… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Extended preprint

  8. arXiv:2303.14409  [pdf, other

    cs.CV

    Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

    Authors: Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

    Abstract: Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to mak… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    MSC Class: 68T07 ACM Class: I.m

  9. arXiv:2302.13875  [pdf, other

    cs.LG stat.ML

    Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

    Authors: Gleb Bazhenov, Denis Kuznedelev, Andrey Malinin, Artem Babenko, Liudmila Prokhorenkova

    Abstract: In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributi… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  10. arXiv:2302.11640  [pdf, ps, other

    cs.LG

    A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

    Authors: Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova

    Abstract: Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized method… ▽ More

    Submitted 2 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  11. arXiv:2210.09223  [pdf, other

    cs.CV cs.LG

    CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

    Authors: Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Prune… ▽ More

    Submitted 31 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    MSC Class: 68T07 ACM Class: I.m

  12. arXiv:2209.06177  [pdf, other

    cs.SI cs.DM cs.LG math.PR

    Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

    Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova

    Abstract: Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into develo** efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the… ▽ More

    Submitted 15 April, 2024; v1 submitted 13 September, 2022; originally announced September 2022.

  13. arXiv:2206.11124  [pdf, other

    cs.LG math.OC stat.ML

    A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

    Authors: Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky

    Abstract: Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of "Spectrally Expres… ▽ More

    Submitted 9 March, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: The revised version accepted at ICLR2023