Skip to main content

Showing 1–8 of 8 results for author: Gadhikar, A

.
  1. arXiv:2406.02773  [pdf, other

    cs.LG cs.CV

    Cyclic Sparse Training: Is it Enough?

    Authors: Advait Gadhikar, Sree Harsha Nelaturu, Rebekka Burkholz

    Abstract: The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2402.19262  [pdf, other

    cs.LG

    Masks, Signs, And Learning Rate Rewinding

    Authors: Advait Gadhikar, Rebekka Burkholz

    Abstract: Learning Rate Rewinding (LRR) has been established as a strong variant of Iterative Magnitude Pruning (IMP) to find lottery tickets in deep overparameterized neural networks. While both iterative pruning schemes couple structure and parameter learning, understanding how LRR excels in both aspects can bring us closer to the design of more flexible deep learning algorithms that can optimize diverse… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted for publishing at ICLR 2024

  3. arXiv:2210.02412  [pdf, other

    cs.LG

    Why Random Pruning Is All We Need to Start Sparse

    Authors: Advait Gadhikar, Sohom Mukherjee, Rebekka Burkholz

    Abstract: Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting sparse networks can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms, even though they do not rely on computationally expensive prune-train iterations and can be drawn initially without significant computational overhead. We offer a theo… ▽ More

    Submitted 31 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at ICML, 2023

  4. arXiv:2210.02411  [pdf, other

    cs.LG

    Dynamical Isometry for Residual Networks

    Authors: Advait Gadhikar, Rebekka Burkholz

    Abstract: The training success, training speed and generalization ability of neural networks rely crucially on the choice of random parameter initialization. It has been shown for multiple architectures that initial dynamical isometry is particularly advantageous. Known initialization schemes for residual blocks, however, miss this property and suffer from degrading separability of different inputs for incr… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: 22 pages, 5 figures

  5. arXiv:2110.11150  [pdf, ps, other

    cs.LG cs.AI

    Lottery Tickets with Nonzero Biases

    Authors: Jonas Fischer, Advait Gadhikar, Rebekka Burkholz

    Abstract: The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning.… ▽ More

    Submitted 7 June, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

  6. arXiv:2110.07751  [pdf, other

    cs.LG stat.ML

    Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation

    Authors: Divyansh Jhunjhunwala, Ankur Mallick, Advait Gadhikar, Swanand Kadhe, Gauri Joshi

    Abstract: We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteri… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  7. arXiv:2107.06917  [pdf, other

    cs.LG

    A Field Guide to Federated Optimization

    Authors: Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz , et al. (28 additional authors not shown)

    Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  8. arXiv:2102.04487  [pdf, other

    cs.LG cs.DC stat.ML

    Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning

    Authors: Divyansh Jhunjhunwala, Advait Gadhikar, Gauri Joshi, Yonina C. Eldar

    Abstract: Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of th… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021