Skip to main content

Showing 1–10 of 10 results for author: Gabriel, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2205.15809  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

    Authors: Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

    Abstract: We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  2. arXiv:2106.15933  [pdf, other

    stat.ML cs.LG

    Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

    Authors: Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

    Abstract: The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $σ^2$ of the parameters at initialization $θ_0$. For DLNs of width $w$, we show a phase transition w.r.t. the scaling $γ$ of the variance $σ^2=w^{-γ}$ as $w\to\infty$: for large variance ($γ<1$), $θ_0$ is very close to a global minimum but far from any saddle point, and for small variance ($γ>1$), $θ_0$ is close t… ▽ More

    Submitted 31 January, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

  3. arXiv:2102.03044  [pdf, other

    cs.GT cs.CL cs.LO cs.SI

    Smart Proofs via Smart Contracts: Succinct and Informative Mathematical Derivations via Decentralized Markets

    Authors: Sylvain Carré, Franck Gabriel, Clément Hongler, Gustavo Lacerda, Gloria Capano

    Abstract: Modern mathematics is built on the idea that proofs should be translatable into formal proofs, whose validity is an objective question, decidable by a computer. Yet, in practice, proofs are informal and may omit many details. An agent considers a proof valid if they trust that it could be expanded into a machine-verifiable proof. A proof's validity can thus become a subjective matter and lead to a… ▽ More

    Submitted 13 October, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: 45 pages, 12 figures

    MSC Class: 03F07; 03F20; 91A05; 91A06; 91A07; 91A10; 91A11; 91A24; 91A26; 91A27; 91A28; 91A80; 68N17; 68P05; 68V15; 68V20; 68V30 ACM Class: F.4

  4. arXiv:2006.09796  [pdf, other

    stat.ML cs.LG math.PR

    Kernel Alignment Risk Estimator: Risk Prediction from Training Data

    Authors: Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

    Abstract: We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $λ>0$ and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,λ}$ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  5. arXiv:2002.08404  [pdf, other

    stat.ML cs.LG

    Implicit Regularization of Random Feature Models

    Authors: Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

    Abstract: Random Feature (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with $P$ features, $N$ data points, and a ridge $λ$, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an effective ri… ▽ More

    Submitted 23 September, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of the International Conference on Machine Learning, 2020, pp. 7397-7406

  6. arXiv:1910.02875  [pdf, other

    cs.LG cs.NE stat.ML

    The asymptotic spectrum of the Hessian of DNN throughout training

    Authors: Arthur Jacot, Franck Gabriel, Clément Hongler

    Abstract: The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-… ▽ More

    Submitted 10 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  7. arXiv:1907.05715  [pdf, other

    cs.LG stat.ML

    Order and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts

    Authors: Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler

    Abstract: We analyze architectural features of Deep Neural Networks (DNNs) using the so-called Neural Tangent Kernel (NTK), which describes the training and generalization of DNNs in the infinite-width setting. In this setting, we show that for fully-connected DNNs, as the depth grows, two regimes appear: "order", where the (scaled) NTK converges to a constant, and "chaos", where it converges to a Kronecker… ▽ More

    Submitted 22 June, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

  8. arXiv:1901.01608  [pdf, other

    cond-mat.dis-nn cs.LG

    Scaling description of generalization with number of parameters in deep learning

    Authors: Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

    Abstract: Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: The clarity of the text has been improved: the section "Related works" has been updated and the section "3.1 Regression task" has been added

  9. arXiv:1806.07572  [pdf, other

    cs.LG cs.NE math.PR stat.ML

    Neural Tangent Kernel: Convergence and Generalization in Neural Networks

    Authors: Arthur Jacot, Franck Gabriel, Clément Hongler

    Abstract: At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function $f_θ$ (which maps input vectors to output vectors) follows the kernel gradient… ▽ More

    Submitted 10 February, 2020; v1 submitted 20 June, 2018; originally announced June 2018.

    Journal ref: In Advances in neural information processing systems (pp. 8571-8580) 2018

  10. arXiv:1802.00521  [pdf, other

    cs.NI

    Multipath Communication with Finite Sliding Window Network Coding for Ultra-Reliability and Low Latency

    Authors: Frank Gabriel, Anil Kumar Chorppath, Ievgenii Tsokalo, Frank H. P. Fitzek

    Abstract: We use random linear network coding (RLNC) based scheme for multipath communication in the presence of lossy links with different delay characteristics to obtain ultra-reliability and low latency. A sliding window version of RLNC is proposed where the coded packets are generated using packets in a window size and are inserted among systematic packets in different paths. The packets are scheduled i… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.