Skip to main content

Showing 1–10 of 10 results for author: Golikov, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.13614  [pdf, other

    cs.LG

    Neural Tangent Kernel: A Survey

    Authors: Eugene Golikov, Eduard Pokonechnyy, Vladimir Korviakov

    Abstract: A seminal work [Jacot et al., 2018] demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity. This equivalence opened a promising direction for applying the results of the rich literature on kernel methods to neural nets which were much harder to tackle. The present survey covers key results on kern… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: 47 pages, 8 figures

  2. arXiv:2205.15809  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

    Authors: Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

    Abstract: We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  3. arXiv:2012.05760  [pdf, ps, other

    cs.LG cs.AI

    Notes on Deep Learning Theory

    Authors: Eugene A. Golikov

    Abstract: These are the notes for the lectures that I was giving during Fall 2020 at the Moscow Institute of Physics and Technology (MIPT) and at the Yandex School of Data Analysis (YSDA). The notes cover some aspects of initialization, loss landscape, generalization, and a neural tangent kernel theory. While many other topics (e.g. expressivity, a mean-field theory, a double descent phenomenon) are missing… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

    Comments: 68 pages

  4. arXiv:2006.06574  [pdf, other

    cs.LG stat.ML

    Dynamically Stable Infinite-Width Limits of Neural Classifiers

    Authors: Eugene A. Golikov

    Abstract: Recent research has been focused on two different approaches to studying neural networks training in the limit of infinite width (1) a mean-field (MF) and (2) a constant neural tangent kernel (NTK) approximations. These two approaches have different scaling of hyperparameters with the width of a network layer and as a result, different infinite-width limit models. We propose a general framework to… ▽ More

    Submitted 22 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 26 pages, 7 figures

  5. arXiv:2003.05884  [pdf, other

    stat.ML cs.LG

    Towards a General Theory of Infinite-Width Limits of Neural Classifiers

    Authors: Eugene A. Golikov

    Abstract: Obtaining theoretical guarantees for neural networks training appears to be a hard problem in a general case. Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: a mean-field (MF) and a constant kernel (NTK) limit theories. We propose a general framework that provides a link between these seemingly distinct theori… ▽ More

    Submitted 23 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: 27 pages, 7 figures, accepted to ICML'2020

  6. arXiv:1911.05402  [pdf, ps, other

    math.OC cs.LG math.ST

    Quadratic number of nodes is sufficient to learn a dataset via gradient descent

    Authors: Biswarup Das, Eugene. A. Golikov

    Abstract: We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function finds the optimal weights of input layer for global minima in linear time. This threshold value is an improvement over previously obtained values. We… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Machine learning using neural networks, gradient descent, optimization, overparametrization regime

    MSC Class: 68T20

  7. arXiv:1905.07187  [pdf, other

    cs.LG stat.ML

    An Essay on Optimization Mystery of Deep Learning

    Authors: Eugene Golikov

    Abstract: Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem "mysterious". We emphasize two mysteries of deep learning: generalization mystery, and optimization mystery. In this essay we review and draw connections between several selected works concerning the latter.

    Submitted 17 May, 2019; originally announced May 2019.

  8. arXiv:1812.02769  [pdf, other

    cs.LG stat.ML

    Embedding-reparameterization procedure for manifold-valued latent variables in generative models

    Authors: Eugene Golikov, Maksim Kretov

    Abstract: Conventional prior for Variational Auto-Encoder (VAE) is a Gaussian distribution. Recent works demonstrated that choice of prior distribution affects learning capacity of VAE models. We propose a general technique (embedding-reparameterization procedure, or ER) for introducing arbitrary manifold-valued variables in VAE model. We compare our technique with a conventional VAE on a toy benchmark prob… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: Presented at Bayesian Deep Learning workshop (NeurIPS 2018)

  9. arXiv:1712.04708  [pdf, other

    cs.CL cs.LG

    Differentiable lower bound for expected BLEU score

    Authors: Vlad Zhukov, Eugene Golikov, Maksim Kretov

    Abstract: In natural language processing tasks performance of the models is often measured with some non-differentiable metric, such as BLEU score. To use efficient gradient-based methods for optimization, it is a common workaround to optimize some surrogate loss function. This approach is effective if optimization of such loss also results in improving target metric. The corresponding problem is referred t… ▽ More

    Submitted 23 August, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Comments: Presented at NIPS 2017 Workshop on Conversational AI: Today's Practice and Tomorrow's Potential

  10. arXiv:1711.07724  [pdf, other

    cs.LG

    Using stochastic computation graphs formalism for optimization of sequence-to-sequence model

    Authors: Eugene Golikov, Vlad Zhukov, Maksim Kretov

    Abstract: Variety of machine learning problems can be formulated as an optimization task for some (surrogate) loss function. Calculation of loss function can be viewed in terms of stochastic computation graphs (SCG). We use this formalism to analyze a problem of optimization of famous sequence-to-sequence model with attention and propose reformulation of the task. Examples are given for machine translation… ▽ More

    Submitted 15 December, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: Presented at 10th NIPS Workshop on Optimization for Machine Learning (NIPS 2017)