Skip to main content

Showing 1–16 of 16 results for author: Hayou, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.08447  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    The Impact of Initialization on LoRA Finetuning Dynamics

    Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu

    Abstract: In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes fine… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: TDLR: Different Initializations lead to completely different finetuning dynamics. One initialization (set A random and B zero) is generally better than the natural opposite initialization. arXiv admin note: text overlap with arXiv:2402.12354

  2. arXiv:2402.12354  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    LoRA+: Efficient Low Rank Adaptation of Large Models

    Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu

    Abstract: In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension). This is due to the fact that adapter matrices A and B in LoRA are updated with the same learning rate. Using scaling arguments for large width networks, we demonstrate that using the same learning rate for A and B does… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 27 pages

  3. arXiv:2310.01683  [pdf, other

    stat.ML cs.LG

    Commutative Width and Depth Scaling in Deep Neural Networks

    Authors: Soufiane Hayou

    Abstract: This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the ne… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 41 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2302.00453

  4. arXiv:2309.09171  [pdf, other

    stat.ML cs.LG

    On the Connection Between Riemann Hypothesis and a Special Class of Neural Networks

    Authors: Soufiane Hayou

    Abstract: The Riemann hypothesis (RH) is a long-standing open problem in mathematics. It conjectures that non-trivial zeros of the zeta function all have real part equal to 1/2. The extent of the consequences of RH is far-reaching and touches a wide spectrum of topics including the distribution of prime numbers, the growth of arithmetic functions, the growth of Euler totient, etc. In this note, we revisit a… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  5. arXiv:2302.06960  [pdf, other

    stat.ML cs.LG

    Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

    Authors: Fadhel Ayed, Soufiane Hayou

    Abstract: Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process. Recent empirical results reveal that random data pruning remains a strong baseline and outperforms most existing data pruning methods in the high compression regime, i.e., where a fraction of $30\%$ or less of the data is kept. This regime has recently attracted a lot of interest as a… ▽ More

    Submitted 6 November, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  6. arXiv:2302.00453  [pdf, other

    stat.ML cs.LG

    Width and Depth Limits Commute in Residual Networks

    Authors: Soufiane Hayou, Greg Yang

    Abstract: We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width… ▽ More

    Submitted 10 August, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 24 pages, 8 figures. arXiv admin note: text overlap with arXiv:2210.00688

  7. arXiv:2210.00688  [pdf, other

    stat.ML cs.LG math.PR

    On the infinite-depth limit of finite-width neural networks

    Authors: Soufiane Hayou

    Abstract: In this paper, we study the infinite-depth limit of finite-width residual neural networks with random Gaussian weights. With proper scaling, we show that by fixing the width and taking the depth to infinity, the pre-activations converge in distribution to a zero-drift diffusion process. Unlike the infinite-width limit where the pre-activation converge weakly to a Gaussian random variable, we show… ▽ More

    Submitted 12 January, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 71 pages, 21 figures

  8. arXiv:2202.10670  [pdf, other

    stat.ML cs.LG

    From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

    Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li

    Abstract: Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine… ▽ More

    Submitted 12 October, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Journal ref: Transactions on Machine Learning Research 2022

  9. arXiv:2110.11804  [pdf, other

    stat.ML cs.LG

    Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

    Authors: Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

    Abstract: We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regular… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 34 pages, 10 figures

  10. arXiv:2110.11749  [pdf, other

    stat.ML cs.LG

    Feature Learning and Signal Propagation in Deep Neural Networks

    Authors: Yizhang Lou, Chris Mingard, Yoonsoo Nam, Soufiane Hayou

    Abstract: Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix). The curve of the alignment as a function of layer index (generally) exhibits an ascent-de… ▽ More

    Submitted 22 May, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: 35 pages

    Journal ref: International Conference on Machine Learning. PMLR, 2022

  11. arXiv:2106.03091  [pdf, other

    stat.ML cs.LG

    Regularization in ResNet with Stochastic Depth

    Authors: Soufiane Hayou, Fadhel Ayed

    Abstract: Regularization plays a major role in modern deep learning. From classic techniques such as L1,L2 penalties to other noise-based methods such as Dropout, regularization often yields better generalization properties by avoiding overfitting. Recently, Stochastic Depth (SD) has emerged as an alternative regularization technique for residual neural networks (ResNets) and has proven to boost the perform… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: 24 pages, 15 figures

  12. arXiv:2010.12859  [pdf, other

    cs.LG stat.ML

    Stable ResNet

    Authors: Soufiane Hayou, Eugenio Clerico, Bobby He, George Deligiannidis, Arnaud Doucet, Judith Rousseau

    Abstract: Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introd… ▽ More

    Submitted 18 March, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: 43 pages, 4 figures

  13. arXiv:2002.08797  [pdf, other

    stat.ML cs.CV cs.LG

    Robust Pruning at Initialization

    Authors: Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh

    Abstract: Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited computational resources. A popular approach consists of using pruning techniques. While these techniques have traditionally focused on pruning pre-trained NN (LeCun et al.,… ▽ More

    Submitted 19 May, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 37 pages, 12 figures

  14. arXiv:1905.13654  [pdf, other

    stat.ML cs.LG

    Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

    Authors: Soufiane Hayou, Arnaud Doucet, Judith Rousseau

    Abstract: Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network… ▽ More

    Submitted 25 May, 2022; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: 59 pages, 8 figures

  15. arXiv:1902.06853  [pdf, other

    stat.ML cs.AI cs.LG

    On the Impact of the Activation Function on Deep Neural Networks Training

    Authors: Soufiane Hayou, Arnaud Doucet, Judith Rousseau

    Abstract: The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is… ▽ More

    Submitted 26 May, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: 22 pages

  16. arXiv:1805.08266  [pdf, other

    stat.ML cs.LG

    On the Selection of Initialization and Activation Function for Deep Neural Networks

    Authors: Soufiane Hayou, Arnaud Doucet, Judith Rousseau

    Abstract: The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is… ▽ More

    Submitted 7 October, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: 8 pages, 15 figures