Skip to main content

Showing 1–50 of 52 results for author: Şimşekli, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17442  [pdf, ps, other

    stat.ML cs.LG

    Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

    Authors: Benjamin Dupuis, Paul Viallard, George Deligiannidis, Umut Simsekli

    Abstract: We propose data-dependent uniform generalization bounds by approaching the problem from a PAC-Bayesian perspective. We first apply the PAC-Bayesian framework on `random sets' in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set after observing the training data. This approach allows us to prove data-dependent bounds, which can be applicable in numero… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  2. arXiv:2403.02051  [pdf, other

    stat.ML cs.CR cs.LG math.ST

    Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

    Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differenti… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2402.12828  [pdf, other

    stat.ML cs.LG math.OC

    SGD with Clip** is Secretly Estimating the Median Gradient

    Authors: Fabian Schaipp, Guillaume Garrigos, Umut Simsekli, Robert Gower

    Abstract: There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training data, learning under privacy constraints, or even heavy-tailed noise due to the dynamics of the algorithm itself. Here we study SGD with robust gradient estimato… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 90C26; 68T07; 62-08

  4. arXiv:2402.08508  [pdf, other

    stat.ML cs.LG

    A PAC-Bayesian Link Between Generalisation and Flat Minima

    Authors: Maxime Haddouche, Paul Viallard, Umut Simsekli, Benjamin Guedj

    Abstract: Modern machine learning usually involves predictors in the overparametrised setting (number of trained parameters greater than dataset size), and their training yield not only good performances on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bo… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: We provide novel PAC-Bayesian generalisation bounds involving gradient norms and being interpretable under the lens of flat minima

  5. arXiv:2402.07723  [pdf, other

    stat.ML cs.LG

    Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

    Authors: Benjamin Dupuis, Umut Şimşekli

    Abstract: Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms.… ▽ More

    Submitted 3 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2402.05101  [pdf, ps, other

    stat.ML cs.LG

    Tighter Generalisation Bounds via Interpolation

    Authors: Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

    Abstract: This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Γ)$-divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We expl… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  7. arXiv:2310.18455  [pdf, other

    cs.LG stat.ML

    Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

    Authors: Krunoslav Lehman Pavasovic, Alain Durmus, Umut Simsekli

    Abstract: A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of hea… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023

  8. arXiv:2307.02501  [pdf, ps, other

    stat.ML cs.LG

    Generalization Guarantees via Algorithm-dependent Rademacher Complexity

    Authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli

    Abstract: Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  9. arXiv:2306.08125  [pdf, other

    stat.ML cs.LG math.PR

    Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

    Authors: Yijun Wan, Melih Barsbey, Abdellatif Zaidi, Umut Simsekli

    Abstract: Neural network compression has been an increasingly important subject, not only due to its practical relevance, but also due to its theoretical implications, as there is an explicit connection between compressibility and generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learne… ▽ More

    Submitted 12 February, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 31 pages, 2 figures

  10. arXiv:2306.04375  [pdf, ps, other

    stat.ML cs.LG

    Learning via Wasserstein-Based High Probability Generalisation Bounds

    Authors: Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

    Abstract: Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) -- this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), wh… ▽ More

    Submitted 27 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  11. arXiv:2305.12056  [pdf, ps, other

    stat.ML cs.LG math.OC

    Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

    Authors: Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli

    Abstract: Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically requir… ▽ More

    Submitted 28 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 49 pages, NeurIPS 2023

  12. arXiv:2303.17109  [pdf, ps, other

    stat.ML cs.LG

    Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models

    Authors: Anant Raj, Umut Şimşekli, Alessandro Rudi

    Abstract: This paper deals with the problem of efficient sampling from a stochastic differential equation, given the drift function and the diffusion matrix. The proposed approach leverages a recent model for probabilities \cite{rudi2021psd} (the positive semi-definite -- PSD model) from which it is possible to obtain independent and identically distributed (i.i.d.) samples at precision $\varepsilon$ with a… ▽ More

    Submitted 24 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  13. arXiv:2302.05516  [pdf, other

    stat.ML cs.LG math.OC

    Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

    Authors: Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

    Abstract: Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s… ▽ More

    Submitted 29 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: To Appear

    Journal ref: Transactions of Machine Learning Research, 2023

  14. arXiv:2302.02766  [pdf, other

    stat.ML cs.LG

    Generalization Bounds with Data-dependent Fractal Dimensions

    Authors: Benjamin Dupuis, George Deligiannidis, Umut Şimşekli

    Abstract: Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, w… ▽ More

    Submitted 10 July, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Journal ref: International Conference on Machine Learning (ICML 2023)

  15. arXiv:2301.11885  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

    Authors: Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error… ▽ More

    Submitted 30 January, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: The first two authors contributed equally to this work

  16. arXiv:2209.08951  [pdf, other

    stat.ML cs.LG

    Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers

    Authors: Sejun Park, Umut Şimşekli, Murat A. Erdogdu

    Abstract: In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  17. arXiv:2206.01274  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

    Authors: Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli

    Abstract: Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has b… ▽ More

    Submitted 13 February, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 50 pages

  18. arXiv:2205.11361  [pdf, other

    stat.ML cs.LG math.DS math.PR

    Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent

    Authors: Soon Hoe Lim, Yijun Wan, Umut Şimşekli

    Abstract: Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce multiscale p… ▽ More

    Submitted 22 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: 24 pages, accepted at NeurIPS 2022

  19. arXiv:2205.06689  [pdf, other

    stat.ML cs.LG math.OC

    Heavy-Tail Phenomenon in Decentralized SGD

    Authors: Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

    Abstract: Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m… ▽ More

    Submitted 16 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

  20. arXiv:2203.02474  [pdf, other

    stat.ML cs.IT cs.LG

    Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

    Authors: Milad Sefidgaran, Amin Gohari, Gaël Richard, Umut Şimşekli

    Abstract: Understanding generalization in modern machine learning settings has been one of the major challenges in statistical learning theory. In this context, recent years have witnessed the development of various generalization bounds suggesting different complexity notions such as the mutual information between the data sample and the algorithm output, compressibility of the hypothesis space, and the fr… ▽ More

    Submitted 29 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

  21. arXiv:2111.13171  [pdf, other

    cs.LG cs.AI cs.CV math.GN stat.ML

    Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks

    Authors: Tolga Birdal, Aaron Lou, Leonidas Guibas, Umut Şimşekli

    Abstract: Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Recently, it has been shown that the trajectories of iterative optimization algorithms can possess fractal structures, and their generalization error can be formally linked to the complexity of such fractals. This complexity is measu… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: Appears at NeurIPS 2021

  22. arXiv:2108.00781  [pdf, other

    stat.ML cs.LG

    Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

    Authors: Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

    Abstract: Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time ap… ▽ More

    Submitted 11 July, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: 22 pages, 6 figures

  23. arXiv:2106.15427  [pdf, other

    stat.ML cs.LG

    Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections

    Authors: Kimia Nadjahi, Alain Durmus, Pierre E. Jacob, Roland Badeau, Umut Şimşekli

    Abstract: The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of meas… ▽ More

    Submitted 4 January, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021

  24. arXiv:2106.04881  [pdf, other

    stat.ML cs.LG

    Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

    Authors: Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu

    Abstract: Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generaliz… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 34 pages including Supplement, 4 Figures

  25. arXiv:2106.03795  [pdf, other

    stat.ML cs.LG

    Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

    Authors: Melih Barsbey, Milad Sefidgaran, Murat A. Erdogdu, Gaël Richard, Umut Şimşekli

    Abstract: Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies have illustrated that even simple pruning strategies can be surprisingly effective, and several theoretical studies have shown that compressible networks (in specific senses) should achieve a low generalizat… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  26. arXiv:2105.08399  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Relative Positional Encoding for Transformers with Linear Complexity

    Authors: Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, Gaël Richard

    Abstract: Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requir… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: ICML 2021 (long talk) camera-ready. 24 pages

  27. arXiv:2102.07006  [pdf, other

    stat.ML cs.LG

    Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

    Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which i… ▽ More

    Submitted 10 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Main paper of 12 pages, followed by appendix

  28. arXiv:2102.05749  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Self-Supervised VQ-VAE for One-Shot Music Style Transfer

    Authors: Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

    Abstract: Neural style transfer, allowing to apply the artistic style of one image to another, has become one of the most widely showcased computer vision applications shortly after its introduction. In contrast, related tasks in the music audio domain remained, until recently, largely untackled. While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot'… ▽ More

    Submitted 10 June, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: ICASSP 2021. Website: https://adasp.telecom-paris.fr/s/ss-vq-vae

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021) 96-100

  29. arXiv:2007.07368  [pdf, other

    stat.ML cs.LG

    Explicit Regularisation in Gaussian Noise Injections

    Authors: Alexander Camuto, Matthew Willetts, Umut Şimşekli, Stephen Roberts, Chris Holmes

    Abstract: We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it… ▽ More

    Submitted 19 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems 34 (2020)

  30. arXiv:2007.06352  [pdf, other

    stat.ML cs.LG math.PR

    Quantitative Propagation of Chaos for SGD in Wide Neural Networks

    Authors: Valentin De Bortoli, Alain Durmus, Xavier Fontaine, Umut Simsekli

    Abstract: In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics… ▽ More

    Submitted 14 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

  31. Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

    Authors: Umut Şimşekli, Ozan Sener, George Deligiannidis, Murat A. Erdogdu

    Abstract: Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigoro… ▽ More

    Submitted 22 May, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Published at NeurIPS 2020 (Spotlight) -- an imprecision in Definition 2 and a mistake in the statement and the proof of Theorem 2 are fixed

  32. arXiv:2006.04740  [pdf, other

    math.OC cs.LG math.ST

    The Heavy-Tail Phenomenon in SGD

    Authors: Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu

    Abstract: In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of stochastic gradient descent (SGD) in deep learning. Some of the popular notions that correlate well with the performance on unseen data are (i) the `flatness' of the local minimum found by SGD, which is related to the eigenvalues of the Hessian, (ii) the ratio of the s… ▽ More

    Submitted 14 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Journal ref: Published as a conference paper at International Conference on Machine Learning (ICML) 2021

  33. arXiv:2004.00663  [pdf, other

    cs.CV cs.GR cs.LG cs.RO stat.ML

    Synchronizing Probability Measures on Rotations via Optimal Transport

    Authors: Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas

    Abstract: We introduce a new paradigm, $\textit{measure synchronization}$, for synchronizing graphs with measure-valued edges. We formulate this problem as maximization of the cycle-consistency in the space of probability measures over relative rotations. In particular, we aim at estimating marginal distributions of absolute orientations by synchronizing the $\textit{conditional}$ ones, which are defined on… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted for publication at CVPR 2020, includes supplementary material. Project website: https://github.com/SynchInVision/probsync

  34. arXiv:2003.05783  [pdf, other

    stat.ML cs.LG

    Statistical and Topological Properties of Sliced Probability Divergences

    Authors: Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, Umut Şimşekli

    Abstract: The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the topological, statistical, and computational consequences of this technique hav… ▽ More

    Submitted 4 January, 2022; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: Published at NeurIPS 2020 (Spotlight)

  35. arXiv:2002.12537  [pdf, other

    stat.ML cs.LG

    Generalized Sliced Distances for Probability Distributions

    Authors: Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Shahin Shahrampour

    Abstract: Probability metrics have become an indispensable part of modern statistics and machine learning, and they play a quintessential role in various applications, including statistical hypothesis testing and generative modeling. However, in a practical setting, the convergence behavior of the algorithms built upon these distances have not been well established, except for a few specific cases. In this… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

  36. arXiv:2002.05685  [pdf, other

    stat.ML cs.LG

    Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

    Authors: Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban

    Abstract: Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study… ▽ More

    Submitted 4 November, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: 20 pages, Published at International Conference on Machine Learning 2020

  37. arXiv:1912.00018  [pdf, other

    stat.ML cs.LG math.CA

    On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Ga… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1901.06053

  38. arXiv:1910.08701  [pdf, other

    math.OC cs.LG stat.ML

    Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

    Authors: Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu

    Abstract: We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are availabl… ▽ More

    Submitted 4 October, 2021; v1 submitted 19 October, 2019; originally announced October 2019.

  39. arXiv:1907.02265  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Supervised Symbolic Music Style Translation Using Synthetic Data

    Authors: Ondřej Cífka, Umut Şimşekli, Gaël Richard

    Abstract: Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style. More recently, several attempts have been made to extend such approaches to music (both symbolic and audio) in order to enable transforming musical style in a similar manner. In this study, we focus on symbolic music with the goal… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: ISMIR 2019 camera-ready

    Journal ref: Proceedings of the 20th International Society for Music Information Retrieval Conference (2019) 588-595

  40. arXiv:1906.09069  [pdf, other

    stat.ML cs.LG

    First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

    Abstract: Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $α$-stable distributions, a family… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

  41. arXiv:1906.04516  [pdf, other

    stat.ML cs.LG

    Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

    Authors: Kimia Nadjahi, Alain Durmus, Umut Şimşekli, Roland Badeau

    Abstract: Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g. Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal transport, the Sliced-Wasserstein (SW) distance has bec… ▽ More

    Submitted 24 March, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: Accepted at NeurIPS 2019 (publication and spotlight presentation)

  42. arXiv:1904.05814  [pdf, other

    cs.CV cs.GR cs.LG cs.RO math.NA

    Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope

    Authors: Tolga Birdal, Umut Şimşekli

    Abstract: We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. In particular, we present two algorithms: (1) Birkhoff-Riemannian L-BFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner, (2) Birkhoff-Riemannian Langevin Monte Carlo for generating sampl… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: To appear as oral presentation at CVPR 2019. 20 pages including the supplementary material

  43. arXiv:1903.04478  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns

    Authors: Ali Taylan Cemgil, Mehmet Burak Kurutmaz, Sinan Yildirim, Melih Barsbey, Umut Simsekli

    Abstract: We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, whe… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: 70 pages, 16 figures

  44. arXiv:1902.03926  [pdf, other

    cs.SD eess.AS stat.ML

    Speech enhancement with variational autoencoders and alpha-stable distributions

    Authors: Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud

    Abstract: This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, inste… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: 5 pages, 3 figures, audio examples and code available online : https://team.inria.fr/perception/research/icassp2019-asvae/. arXiv admin note: text overlap with arXiv:1811.06713

    Report number: hal-02005106

    Journal ref: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Brighton, UK, May 2019, pp. 541-545

  45. arXiv:1902.00434  [pdf, other

    cs.LG stat.ML

    Generalized Sliced Wasserstein Distances

    Authors: Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, Gustavo K. Rohde

    Abstract: The Wasserstein distance and its variations, e.g., the sliced-Wasserstein (SW) distance, have recently drawn attention from the machine learning community. The SW distance, specifically, was shown to have similar properties to the Wasserstein distance, while being much simpler to compute, and is therefore used in various applications including generative modeling and general supervised/unsupervise… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

  46. arXiv:1901.07487  [pdf, other

    math.OC cs.LG stat.ML

    Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Gaël Richard

    Abstract: Recent studies on diffusion-based sampling methods have shown that Langevin Monte Carlo (LMC) algorithms can be beneficial for non-convex optimization, and rigorous theoretical guarantees have been proven for both asymptotic and finite-time regimes. Algorithmically, LMC-based algorithms resemble the well-known gradient descent (GD) algorithm, where the GD recursion is perturbed by an additive Gaus… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  47. arXiv:1901.06053  [pdf, other

    cs.LG stat.ML

    A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

    Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussiani… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

  48. arXiv:1806.08141  [pdf, other

    stat.ML cs.LG

    Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions

    Authors: Antoine Liutkus, Umut Şimşekli, Szymon Majewski, Alain Durmus, Fabian-Robert Stöter

    Abstract: By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to… ▽ More

    Submitted 11 June, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

    Comments: Published at the International Conference on Machine Learning (ICML) 2019

  49. arXiv:1806.02617  [pdf, other

    stat.ML cs.LG

    Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

    Authors: Umut Şimşekli, Çağatay Yıldız, Thanh Huy Nguyen, Gaël Richard, A. Taylan Cemgil

    Abstract: Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm i… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: Published in the International Conference on Machine Learning (ICML 2018)

  50. arXiv:1805.12279  [pdf, other

    cs.CV cs.AI cs.CG cs.RO stat.ML

    Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

    Authors: Tolga Birdal, Umut Şimşekli, M. Onur Eken, Slobodan Ilic

    Abstract: We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and map**). TG-MCMC is first of its kind as it unites asymptotically global non-convex optimization on the spherical manifold of quaternions with posterior sampling, in or… ▽ More

    Submitted 30 March, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: Published at NeurIPS 2018, 25 pages with supplements