Skip to main content

Showing 1–20 of 20 results for author: Alistarh, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.20452  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

    Authors: Rustem Islamov, Mher Safaryan, Dan Alistarh

    Abstract: We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  2. arXiv:2107.03860  [pdf, other

    cs.LG stat.ML

    SSSE: Efficiently Erasing Samples from Trained Machine Learning Models

    Authors: Alexandra Peste, Dan Alistarh, Christoph H. Lampert

    Abstract: The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already tr… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

  3. arXiv:2104.13818   

    cs.LG math.OC stat.ML

    NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

    Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

    Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More

    Submitted 1 May, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: This entry is redundant and was created in error. See arXiv:1908.06077 for the latest version

  4. arXiv:2010.12460  [pdf, other

    cs.LG stat.ML

    Adaptive Gradient Quantization for Data-Parallel SGD

    Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

    Abstract: Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression sch… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at the conference on Neural Information Processing Systems (NeurIPS 2020)

  5. arXiv:2006.07362  [pdf, other

    cs.LG stat.ML

    Stochastic Gradient Langevin with Delayed Gradients

    Authors: Vyacheslav Kungurtsev, Bapi Chatterjee, Dan Alistarh

    Abstract: Stochastic Gradient Langevin Dynamics (SGLD) ensures strong guarantees with regards to convergence in measure for sampling log-concave posterior distributions by adding noise to stochastic gradient iterates. Given the size of many practical problems, parallelizing across several asynchronously running processors is a popular strategy for reducing the end-to-end computation time of stochastic optim… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  6. arXiv:2004.14340  [pdf, other

    cs.LG stat.ML

    WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

    Authors: Sidak Pal Singh, Dan Alistarh

    Abstract: Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies… ▽ More

    Submitted 25 November, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: NeurIPS 2020

  7. arXiv:2002.11505  [pdf, other

    cs.DC cs.AI cs.LG stat.ML

    Relaxed Scheduling for Scalable Belief Propagation

    Authors: Vitaly Aksenov, Dan Alistarh, Janne H. Korhonen

    Abstract: The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning. Consequently, there has been considerable effort invested into develo** efficient parallel variants of classic machine learning algorithms. However, despite the wealth of knowledge on parallelization, some classic machine learning algorithms often prov… ▽ More

    Submitted 18 January, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

  8. arXiv:2002.10384  [pdf, other

    cs.LG stat.ML

    On the Sample Complexity of Adversarial Multi-Source PAC Learning

    Authors: Nikola Konstantinov, Elias Frantar, Dan Alistarh, Christoph H. Lampert

    Abstract: We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-so… ▽ More

    Submitted 30 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: International Conference on Machine Learning (ICML) 2020: Camera-ready. Strengthened the definition of adversarial PAC-learnability, added explicit bounds on sample complexity

  9. arXiv:2002.09268  [pdf, other

    cs.LG cs.DC stat.ML

    New Bounds For Distributed Mean Estimation and Variance Reduction

    Authors: Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi, Saleh Ashkboos, Dan Alistarh

    Abstract: We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $μ= \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants… ▽ More

    Submitted 7 April, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: 42 pages, 16 figures

  10. arXiv:2001.05918  [pdf, other

    cs.LG stat.ML

    Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

    Authors: Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh

    Abstract: Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of… ▽ More

    Submitted 28 June, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

  11. arXiv:1910.12308  [pdf, other

    cs.LG cs.DC stat.ML

    Asynchronous Decentralized SGD with Quantized and Local Updates

    Authors: Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh

    Abstract: Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in… ▽ More

    Submitted 25 March, 2022; v1 submitted 27 October, 2019; originally announced October 2019.

  12. arXiv:1909.02253  [pdf, other

    cs.LG stat.ML

    Powerset Convolutional Neural Networks

    Authors: Chris Wendler, Dan Alistarh, Markus Püschel

    Abstract: We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set. The convolutions are derived as linear, shift-equivariant functions for various notions of shifts on set functions. The framework is fundamentally different from graph convolutions based on the Laplacian, as it provides not one but several basic shifts, one for… ▽ More

    Submitted 15 January, 2020; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Advances in Neural Information Processing Systems 32

    Journal ref: Advances in Neural Information Processing Systems, Vol. 32, pp. 927-938, 2019

  13. arXiv:1908.06077  [pdf, other

    cs.LG stat.ML

    NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

    Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

    Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More

    Submitted 3 May, 2021; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: 42 pages, 21 figures. To appear in the Journal of Machine Learning Research (JMLR)

  14. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  15. arXiv:1809.10505  [pdf, other

    cs.LG cs.DC stat.ML

    The Convergence of Sparsified Gradient Methods

    Authors: Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli

    Abstract: Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods - where each node sorts gradients by magnitude, and only c… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: NIPS 2018 - Advances in Neural Information Processing Systems; Authors in alphabetic order

  16. arXiv:1803.08917  [pdf, other

    cs.LG cs.DC cs.DS math.OC stat.ML

    Byzantine Stochastic Gradient Descent

    Authors: Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

    Abstract: This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the $m$ machines which allegedly compute stochastic gradients every iteration, an $α$-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

  17. arXiv:1803.08841  [pdf, other

    cs.DC cs.LG stat.ML

    The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

    Authors: Dan Alistarh, Christopher De Sa, Nikola Konstantinov

    Abstract: Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from ex… ▽ More

    Submitted 22 June, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: To be published in PoDC 2018; 18 pages, 1 figure; Changes: added pseudocode for Algorithm 2, some references and corrected typos

  18. arXiv:1802.08021  [pdf, other

    cs.DC stat.ML

    SparCML: High-Performance Sparse Communication for Machine Learning

    Authors: Cedric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler

    Abstract: Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learnin… ▽ More

    Submitted 16 August, 2019; v1 submitted 22 February, 2018; originally announced February 2018.

  19. Compressive Sensing Using Iterative Hard Thresholding with Low Precision Data Representation: Theory and Applications

    Authors: Nezihe Merve Gürel, Kaan Kara, Alen Stojanov, Tyler Smith, Thomas Lemmin, Dan Alistarh, Markus Püschel, Ce Zhang

    Abstract: Modern scientific instruments produce vast amounts of data, which can overwhelm the processing ability of computer systems. Lossy compression of data is an intriguing solution, but comes with its own drawbacks, such as potential signal loss, and the need for careful optimization of the compression ratio. In this work, we focus on a setting where this problem is especially acute: compressive sensin… ▽ More

    Submitted 22 December, 2020; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: 19 pages, 5 figures, 1 table, in IEEE Transactions on Signal Processing Vol. 68, No. 7, pp. 4268-4282, 2020

  20. arXiv:1611.05402  [pdf, other

    cs.LG stat.ML

    The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning

    Authors: Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, Ce Zhang

    Abstract: Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to co… ▽ More

    Submitted 19 June, 2017; v1 submitted 16 November, 2016; originally announced November 2016.