Skip to main content

Showing 1–10 of 10 results for author: Peste, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.06872  [pdf, other

    cs.LG

    ELSA: Partial Weight Freezing for Overhead-Free Sparse Network Deployment

    Authors: Paniz Halvachi, Alexandra Peste, Dan Alistarh, Christoph H. Lampert

    Abstract: We present ELSA, a practical solution for creating deep networks that can easily be deployed at different levels of sparsity. The core idea is to embed one or more sparse networks within a single dense network as a proper subset of the weights. At prediction time, any sparse model can be extracted effortlessly simply be zeroing out weights according to a predefined mask. ELSA is simple, powerful a… ▽ More

    Submitted 17 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: updated to reflect PackNet prior work

  2. arXiv:2308.02060  [pdf, other

    cs.LG cs.AI

    Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

    Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

    Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and mo… ▽ More

    Submitted 8 September, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  3. arXiv:2305.17581  [pdf, other

    cs.LG math.OC

    Knowledge Distillation Performs Partial Variance Reduction

    Authors: Mher Safaryan, Alexandra Peste, Dan Alistarh

    Abstract: Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread use, the underlying mechanics behind knowledge distillation (KD) are still not fully understood. In this work, we shed new light on the inner workings of this m… ▽ More

    Submitted 8 December, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 15+22 pages, NeurIPS 2023

  4. arXiv:2304.12622  [pdf, other

    cs.CV cs.LG

    Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

    Authors: Eugenia Iofinova, Alexandra Peste, Dan Alistarh

    Abstract: Pruning - that is, setting a significant subset of the parameters of a neural network to zero - is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is no… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: 8 Pages / 49 with references and appendix. Accepted to CVPR 2023

  5. arXiv:2207.14200  [pdf, other

    cs.LG

    CrAM: A Compression-Aware Minimizer

    Authors: Alexandra Peste, Adrian Vladu, Eldar Kurtic, Christoph H. Lampert, Dan Alistarh

    Abstract: Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trai… ▽ More

    Submitted 4 May, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted to ICLR 2023

  6. arXiv:2111.13445  [pdf, other

    cs.CV cs.AI cs.LG

    How Well Do Sparse Imagenet Models Transfer?

    Authors: Eugenia Iofinova, Alexandra Peste, Mark Kurtz, Dan Alistarh

    Abstract: Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (… ▽ More

    Submitted 21 April, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR'22. This version: 25 pages, 9 figures (including appendix). **Includes extended upstream training results, which are not present in the CVPR version.**

  7. arXiv:2107.03860  [pdf, other

    cs.LG stat.ML

    SSSE: Efficiently Erasing Samples from Trained Machine Learning Models

    Authors: Alexandra Peste, Dan Alistarh, Christoph H. Lampert

    Abstract: The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already tr… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

  8. arXiv:2106.12379  [pdf, other

    cs.LG cs.AI

    AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

    Authors: Alexandra Peste, Eugenia Iofinova, Adrian Vladu, Dan Alistarh

    Abstract: The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are often empirical an… ▽ More

    Submitted 15 December, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021

  9. arXiv:2102.00554  [pdf, other

    cs.LG cs.AI cs.AR cs.CV cs.NE

    Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

    Authors: Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste

    Abstract: The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten traini… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: 90 pages, 26 figures

  10. arXiv:1807.01889  [pdf, other

    cs.LG stat.ML

    Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds

    Authors: Septimia Sârbu, Riccardo Volpi, Alexandra Peşte, Luigi Malagò

    Abstract: In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the Rényi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and charact… ▽ More

    Submitted 5 July, 2018; originally announced July 2018.

    Comments: accepted at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, Stockholm, Sweden, 2018