Skip to main content

Showing 1–11 of 11 results for author: Kaplun, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.01640  [pdf, other

    cs.LG cs.AI

    Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

    Authors: Etay Livne, Gal Kaplun, Eran Malach, Shai Shalev-Schwatz

    Abstract: When using Stochastic Gradient Descent (SGD) for training machine learning models, it is often crucial to provide the model with examples sampled at random from the dataset. However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient. A recent work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile, which greatly improves… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 19 pages, 5 figures

  2. arXiv:2306.08590  [pdf, other

    cs.LG stat.ML

    Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

    Authors: Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

    Abstract: The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not… ▽ More

    Submitted 7 June, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

  3. arXiv:2302.06354  [pdf, other

    cs.LG cs.AI

    Less is More: Selective Layer Finetuning with SubTuning

    Authors: Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

    Abstract: Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, kee** the rest of the weights frozen at their initial (pretrained) v… ▽ More

    Submitted 2 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  4. arXiv:2203.14649  [pdf, other

    cs.LG cs.AI stat.ML

    Knowledge Distillation: Bad Models Can Be Good Role Models

    Authors: Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

    Abstract: Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work \citep{nakkiran2020distributional} has empirically observed that such networks behave as "conditional samplers" from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  5. arXiv:2202.09931  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Deconstructing Distributions: A Pointwise Framework of Learning

    Authors: Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

    Abstract: In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution… ▽ More

    Submitted 7 June, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

    Comments: GK and NG contributed equally. v2: Added Figures 4, 5

  6. arXiv:2103.06875  [pdf, other

    cs.LG

    For Manifold Learning, Deep Neural Networks can be Locality Sensitive Hash Functions

    Authors: Nishanth Dikkala, Gal Kaplun, Rina Panigrahy

    Abstract: It is well established that training deep neural networks gives useful representations that capture essential features of the inputs. However, these representations are poorly understood in theory and practice. In the context of supervised learning an important question is whether these representations capture features informative for classification, while filtering out non-informative noisy ones.… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  7. arXiv:2010.08508  [pdf, other

    cs.LG cs.NE stat.ML

    For self-supervised learning, Rationality implies generalization, provably

    Authors: Yamini Bansal, Gal Kaplun, Boaz Barak

    Abstract: We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e.g., linear) classifier $g$ to the labels. Specifically, we show that (under the assumptions described below) the generalization gap of such classifiers tends to zero if $\mathsf{C}(g) \ll n$, where… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  8. arXiv:2002.09422  [pdf, other

    cs.LG stat.ML

    Robustness from Simple Classifiers

    Authors: Sharon Qian, Dimitris Kalimeris, Gal Kaplun, Yaron Singer

    Abstract: Despite the vast success of Deep Neural Networks in numerous application domains, it has been shown that such models are not robust i.e., they are vulnerable to small adversarial perturbations of the input. While extensive work has been done on why such perturbations occur or how to successfully defend against them, we still do not have a complete understanding of robustness. In this work, we inve… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

  9. arXiv:1912.02292  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Deep Double Descent: Where Bigger Models and More Data Hurt

    Authors: Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

    Abstract: We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effect… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: G.K. and Y.B. contributed equally

  10. arXiv:1905.11604  [pdf, other

    cs.LG cs.NE stat.ML

    SGD on Neural Networks Learns Functions of Increasing Complexity

    Authors: Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak

    Abstract: We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations pro… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: Submitted to NeurIPS 2019

  11. arXiv:1903.03746  [pdf, other

    cs.LG stat.ML

    Robust Influence Maximization for Hyperparametric Models

    Authors: Dimitris Kalimeris, Gal Kaplun, Yaron Singer

    Abstract: In this paper, we study the problem of robust influence maximization in the independent cascade model under a hyperparametric assumption. In social networks users influence and are influenced by individuals with similar characteristics and as such, they are associated with some features. A recent surging research direction in influence maximization focuses on the case where the edge probabilities… ▽ More

    Submitted 12 May, 2019; v1 submitted 9 March, 2019; originally announced March 2019.