Skip to main content

Showing 1–5 of 5 results for author: Ged, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.05626  [pdf, other

    cs.LG

    Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Esca**, and Network Embedding

    Authors: Zhengqing Wu, Berfin Simsek, Francois Ged

    Abstract: In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additi… ▽ More

    Submitted 11 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2303.12785  [pdf, other

    cs.LG cs.AI

    Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

    Authors: François Ged, Maria Han Veiga

    Abstract: A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradient (MPG), is introduced and studied, in the context of max-entropy reinforcement learning, where an agent aims at maximising entropy bonuses additional to its cumulative rewards. MPG differs from standard PG in that it trains a sequence of policies to learn finite horizon tasks simultaneously, instead of a single policy for the… ▽ More

    Submitted 25 June, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    MSC Class: 68T07 ACM Class: I.2.0; I.2.6

  3. arXiv:2106.15933  [pdf, other

    stat.ML cs.LG

    Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

    Authors: Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

    Abstract: The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $σ^2$ of the parameters at initialization $θ_0$. For DLNs of width $w$, we show a phase transition w.r.t. the scaling $γ$ of the variance $σ^2=w^{-γ}$ as $w\to\infty$: for large variance ($γ<1$), $θ_0$ is very close to a global minimum but far from any saddle point, and for small variance ($γ>1$), $θ_0$ is close t… ▽ More

    Submitted 31 January, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

  4. arXiv:2105.12221  [pdf, other

    cs.LG

    Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

    Authors: Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

    Abstract: We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to conn… ▽ More

    Submitted 12 September, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

    Comments: 29 pages, 12 figures, ICML 2021

  5. arXiv:1907.05715  [pdf, other

    cs.LG stat.ML

    Order and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts

    Authors: Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler

    Abstract: We analyze architectural features of Deep Neural Networks (DNNs) using the so-called Neural Tangent Kernel (NTK), which describes the training and generalization of DNNs in the infinite-width setting. In this setting, we show that for fully-connected DNNs, as the depth grows, two regimes appear: "order", where the (scaled) NTK converges to a constant, and "chaos", where it converges to a Kronecker… ▽ More

    Submitted 22 June, 2020; v1 submitted 11 July, 2019; originally announced July 2019.