Skip to main content

Showing 1–18 of 18 results for author: Yehudai, G

.
  1. arXiv:2407.02240  [pdf, other

    cs.LG cs.CR cs.NE stat.ML

    MALT Powers Up Adversarial Attacks

    Authors: Odelia Melamed, Gilad Yehudai, Adi Shamir

    Abstract: Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on medium-scale almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2401.07606  [pdf, ps, other

    cs.LG math.OC stat.ML

    RedEx: Beyond Fixed Representation Methods via Convex Optimization

    Authors: Amit Daniely, Mariano Schain, Gilad Yehudai

    Abstract: Optimizing Neural networks is a difficult task which is still not well understood. On the other hand, fixed representation methods such as kernels and random features have provable optimization guarantees but inferior performance due to their inherent inability to learn the representations. In this paper, we aim at bridging this gap by presenting a novel architecture called RedEx (Reduced Expander… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2311.13877  [pdf, other

    cs.LG math.OC stat.ML

    Locally Optimal Descent for Dynamic Stepsize Scheduling

    Authors: Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

    Abstract: We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method wit… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  4. arXiv:2307.01827  [pdf, other

    cs.LG

    Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

    Authors: Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Yakir Oz, Yaniv Nikankin, Michal Irani

    Abstract: Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work,… ▽ More

    Submitted 2 November, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: Code: https://github.com/gonbuzaglo/decoreco. arXiv admin note: text overlap with arXiv:2305.03350

  5. arXiv:2305.15141  [pdf, other

    cs.LG cs.NE stat.ML

    From Tempered to Benign Overfitting in ReLU Neural Networks

    Authors: Guy Kornowski, Gilad Yehudai, Ohad Shamir

    Abstract: Overparameterized neural networks (NNs) are observed to generalize well even when trained to perfectly fit noisy data. This phenomenon motivated a large body of work on "benign overfitting", where interpolating predictors achieve near-optimal performance. Recently, it was conjectured and empirically observed that the behavior of NNs is often better described as "tempered overfitting", where the pe… ▽ More

    Submitted 21 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023; fixed bug

  6. arXiv:2305.03350  [pdf, other

    cs.LG cs.CR cs.CV

    Reconstructing Training Data from Multiclass Neural Networks

    Authors: Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Michal Irani

    Abstract: Reconstructing samples from the training set of trained neural networks is a major privacy concern. Haim et al. (2022) recently showed that it is possible to reconstruct training samples from neural network binary classifiers, based on theoretical results about the implicit bias of gradient methods. In this work, we present several improvements and new insights over this previous work. As our main… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  7. arXiv:2303.00783  [pdf, other

    cs.LG cs.CR cs.NE stat.ML

    Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

    Authors: Odelia Melamed, Gilad Yehudai, Gal Vardi

    Abstract: Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogon… ▽ More

    Submitted 16 November, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Camera ready version for NeurIPS 2023

  8. arXiv:2206.07758  [pdf, other

    cs.LG cs.CR cs.CV cs.NE stat.ML

    Reconstructing Training Data from Trained Neural Networks

    Authors: Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani

    Abstract: Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results abo… ▽ More

    Submitted 5 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Fixed a typo in the acknowledgements

  9. arXiv:2202.04347  [pdf, other

    cs.LG

    Gradient Methods Provably Converge to Non-Robust Networks

    Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

    Abstract: Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly,… ▽ More

    Submitted 4 October, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Minor fixes made for the NeurIPS CR version

  10. arXiv:2202.03841  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Width is Less Important than Depth in ReLU Neural Networks

    Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

    Abstract: We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of wi… ▽ More

    Submitted 1 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: Camera ready version in COLT 2022

  11. arXiv:2110.03187  [pdf, ps, other

    cs.LG cs.NE stat.ML

    On the Optimal Memorization Power of ReLU Neural Networks

    Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

    Abstract: We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any $N$ points that satisfy a mild separability assumption using $\tilde{O}\left(\sqrt{N}\right)$ parameters. Known VC-dimension upper bounds imply that memorizing $N$ samples requires $Ω(\sqrt{N})$ parameters, and hence our construction is optimal up to logarithmic factors. We also give a… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  12. arXiv:2106.01101  [pdf, other

    cs.LG cs.NE stat.ML

    Learning a Single Neuron with Bias Using Gradient Descent

    Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

    Abstract: We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto σ(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent. Perhaps surprisingly, we show that this is a significantly different and more challenging problem than the bias-less case (which was the focus of previous works on single neurons… ▽ More

    Submitted 5 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: An updated version, corresponding to the NeurIPS 2021 camera-ready version

  13. arXiv:2102.00434  [pdf, ps, other

    cs.LG cs.NE stat.ML

    The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

    Authors: Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricat… ▽ More

    Submitted 18 July, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: COLT 2021 camera ready version

  14. arXiv:2010.08853  [pdf, other

    cs.LG cs.NE stat.ML

    From Local Structures to Size Generalization in Graph Neural Networks

    Authors: Gilad Yehudai, Ethan Fetaya, Eli Meirom, Gal Chechik, Haggai Maron

    Abstract: Graph neural networks (GNNs) can process graphs of different sizes, but their ability to generalize across sizes, specifically from small to large graphs, is still not well understood. In this paper, we identify an important type of data where generalization from small to large graphs is challenging: graph distributions for which the local structure depends on the graph size. This effect occurs in… ▽ More

    Submitted 15 July, 2021; v1 submitted 17 October, 2020; originally announced October 2020.

    Comments: Camera ready version for ICML 2021

  15. arXiv:2006.01005  [pdf, other

    cs.LG stat.ML

    The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks

    Authors: Itay Safran, Gilad Yehudai, Ohad Shamir

    Abstract: We study the effects of mild over-parameterization on the optimization landscape of a simple ReLU neural network of the form $\mathbf{x}\mapsto\sum_{i=1}^k\max\{0,\mathbf{w}_i^{\top}\mathbf{x}\}$, in a well-studied teacher-student setting where the target values are generated by the same architecture, and when directly optimizing over the population squared loss with respect to Gaussian inputs. We… ▽ More

    Submitted 30 July, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

  16. arXiv:2002.00585  [pdf, ps, other

    cs.LG stat.ML

    Proving the Lottery Ticket Hypothesis: Pruning is All You Need

    Authors: Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network. We prove an even stronger hypothesis (as was also conjectured in Ramanujan et al., 2019), showing that for every bounded distribution and every target network with bounded wei… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

  17. arXiv:2001.05205  [pdf, other

    cs.LG cs.NE stat.ML

    Learning a Single Neuron with Gradient Methods

    Authors: Gilad Yehudai, Ohad Shamir

    Abstract: We consider the fundamental problem of learning a single neuron $x \mapstoσ(w^\top x)$ using standard gradient methods. As opposed to previous works, which considered specific (and not always realistic) input distributions and activation functions $σ(\cdot)$, we ask whether a more general result is attainable, under milder assumptions. On the one hand, we show that some assumptions on the distribu… ▽ More

    Submitted 27 February, 2022; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Fixed a small bug in the proof of Theorem 4.2

  18. arXiv:1904.00687  [pdf, ps, other

    cs.LG cs.NE stat.ML

    On the Power and Limitations of Random Features for Understanding Neural Networks

    Authors: Gilad Yehudai, Ohad Shamir

    Abstract: Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as… ▽ More

    Submitted 27 February, 2022; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: Comparison to previous version: Fixed a bug in Theorem 3.4 about approximating polynomials as an expectation of random features. Also added another assumption on the activaion function in theorem 3.1