Skip to main content

Showing 1–50 of 193 results for author: Bach, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.07514  [pdf, other

    cs.AI math.ST

    Physics-informed machine learning as a kernel method

    Authors: Nathan Doumèche, Francis Bach, Gérard Biau, Claire Boyer

    Abstract: Physics-informed machine learning combines the expressiveness of data-based approaches with the interpretability of physical models. In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency. We prove that for linear differential priors, the problem can be formulated as a kernel re… ▽ More

    Submitted 19 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2311.12436  [pdf, other

    cs.LG

    Classifier Calibration with ROC-Regularized Isotonic Regression

    Authors: Eugene Berta, Francis Bach, Michael Jordan

    Abstract: Calibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model confidence and actual probabilities. One prominent technique, isotonic regression (IR), aims at calibrating binary classifiers by minimizing the cross entropy on a calibration set via monotone transformations. IR acts as an adaptive binning procedure, which allo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  3. arXiv:2310.10807  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Regularization properties of adversarially-trained linear regression

    Authors: Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön

    Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted (spotlight) NeurIPS 2023; A preliminary version of this work titled: "Surprises in adversarially-trained linear regression" was made available under a different identifier: arXiv:2205.12695

  4. Variational Gaussian approximation of the Kushner optimal filter

    Authors: Marc Lambert, Silvère Bonnabel, Francis Bach

    Abstract: In estimation theory, the Kushner equation provides the evolution of the probability density of the state of a dynamical system given continuous-time observations. Building upon our recent work, we propose a new way to approximate the solution of the Kushner equation through tractable variational Gaussian approximations of two proximal losses associated with the propagation and Bayesian update of… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Lecture Notes in Computer Science, 2023

  5. arXiv:2307.12754  [pdf, other

    stat.ME cs.AI cs.LG math.ST

    Nonparametric Linear Feature Learning in Regression Through Regularisation

    Authors: Bertille Follain, Francis Bach

    Abstract: Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would gre… ▽ More

    Submitted 5 March, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 42 pages, 5 figures

    MSC Class: 62G08; 62F10 (Primary); 65K10 (Secondary) ACM Class: I.2.6

  6. arXiv:2306.16255  [pdf, other

    math.OC cs.IT math.ST

    Theory and applications of the Sum-Of-Squares technique

    Authors: Francis Bach, Elisabetta Cornacchia, Luca Pesce, Giovanni Piccioli

    Abstract: The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We star… ▽ More

    Submitted 11 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: These are notes from the lecture of Francis Bach given at the summer school "Statistical Physics & Machine Learning", that took place in Les Houches School of Physics in France from 4th to 29th July 2022. The school was organized by Florent Krzakala and Lenka Zdeborová from EPFL. 19 pages, 4 figures

  7. arXiv:2306.00742  [pdf, other

    cs.LG cs.AI stat.ML

    The Galerkin method beats Graph-Based Approaches for Spectral Algorithms

    Authors: Vivien Cabannes, Francis Bach

    Abstract: Historically, the machine learning community has derived spectral decompositions from graph-based approaches. We break with this approach and prove the statistical and computational superiority of the Galerkin method, which consists in restricting the study to a small set of test functions. In particular, we introduce implementation tricks to deal with differential operators in large dimensions wi… ▽ More

    Submitted 26 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Journal ref: AISTATS 2024

  8. arXiv:2305.19473  [pdf, other

    stat.ML cs.LG stat.CO

    Chain of Log-Concave Markov Chains

    Authors: Saeed Saremi, Ji Won Park, Francis Bach

    Abstract: We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels… ▽ More

    Submitted 28 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

  9. arXiv:2305.18399  [pdf, other

    cs.LG cs.AI stat.ML

    On the impact of activation and normalization in obtaining isometric embeddings at initialization

    Authors: Amir Joudaki, Hadi Daneshmand, Francis Bach

    Abstract: In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, pl… ▽ More

    Submitted 17 November, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  10. arXiv:2305.16358  [pdf, other

    cs.LG cs.AI stat.ML

    Differentiable Clustering with Perturbed Spanning Forests

    Authors: Lawrence Stewart, Francis S Bach, Felipe Llinares López, Quentin Berthet

    Abstract: We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn fr… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Journal ref: 37th Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States

  11. arXiv:2303.14195  [pdf, other

    cs.DS

    The limited-memory recursive variational Gaussian approximation (L-RVGA)

    Authors: Marc Lambert, Silvère Bonnabel, Francis Bach

    Abstract: We consider the problem of computing a Gaussian approximation to the posterior distribution of a parameter given a large number N of observations and a Gaussian prior, when the dimension of the parameter d is also large. To address this problem we build on a recently introduced recursive algorithm for variational Gaussian approximation of the posterior, called recursive variational Gaussian approx… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Statistics and Computing, In press

  12. arXiv:2303.11669  [pdf, other

    stat.ML cs.LG

    Universal Smoothed Score Functions for Generative Modeling

    Authors: Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach

    Abstract: We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universa… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Technical Report

  13. arXiv:2303.09532  [pdf, ps, other

    math.OC cs.LG

    Variational Principles for Mirror Descent and Mirror Langevin Dynamics

    Authors: Belinda Tzen, Anant Raj, Maxim Raginsky, Francis Bach

    Abstract: Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function. It arises as a basic primitive in a variety of applications, including large-scale optimization, machine learning, and control. This paper proposes a variatio… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 6 pages

  14. arXiv:2303.03237  [pdf, other

    stat.ML cs.LG math.ST stat.CO

    Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

    Authors: David Holzmüller, Francis Bach

    Abstract: Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality i… ▽ More

    Submitted 1 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Changes in v3: Minor corrections and improvements. Plots can be reproduced using the code at https://github.com/dholzmueller/sampling_experiments

  15. arXiv:2303.01372  [pdf, other

    cs.LG stat.ML

    High-dimensional analysis of double descent for linear regression with random projections

    Authors: Francis Bach

    Abstract: We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory. We first consider the ridge regression estimator and review earlier results using classical notions from non-parametric statistics, namely degrees of freedom, also known a… ▽ More

    Submitted 14 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  16. arXiv:2302.06757  [pdf, other

    stat.ML cs.LG math.NA math.ST

    Kernelized Diffusion maps

    Authors: Loucas Pillaud-Vivien, Francis Bach

    Abstract: Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data. The core of these procedures is the approximation of a Laplacian through a graph kernel approach, however this local average construction is known to be cursed by the high-dimension d. In this article, we build a different estimator of th… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 19 pages, 1 Figure

  17. arXiv:2302.03542  [pdf, other

    cs.LG math.OC

    Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

    Authors: Blake Woodworth, Konstantin Mishchenko, Francis Bach

    Abstract: We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $δ$-smooth, our algorithm guarantees convergence at a… ▽ More

    Submitted 7 June, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  18. arXiv:2302.03459  [pdf, ps, other

    cs.LG math.ST stat.ML

    On the relationship between multivariate splines and infinitely-wide neural networks

    Authors: Francis Bach

    Abstract: We consider multivariate splines and show that they have a random feature expansion as infinitely wide neural networks with one-hidden layer and a homogeneous activation function which is the power of the rectified linear unit. We show that the associated function space is a Sobolev space on a Euclidean ball, with an explicit bound on the norms of derivatives. This link provides a new random featu… ▽ More

    Submitted 1 March, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  19. arXiv:2211.05641  [pdf, other

    cs.LG cs.AI stat.ML

    Regression as Classification: Influence of Task Formulation on Neural Network Features

    Authors: Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert

    Abstract: Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature spa… ▽ More

    Submitted 1 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  20. arXiv:2209.09162  [pdf, other

    math.OC cs.LG

    On the Theoretical Properties of Noise Correlation in Stochastic Optimization

    Authors: Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting

    Abstract: Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle point… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Journal ref: Neurips 2022

  21. arXiv:2206.13285  [pdf, ps, other

    cs.IT cs.LG math.OC stat.ML

    Sum-of-Squares Relaxations for Information Theory and Variational Inference

    Authors: Francis Bach

    Abstract: We consider extensions of the Shannon relative entropy, referred to as $f$-divergences.Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models. These problems are related to one another through convex duality, and for all them, there are… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

  22. arXiv:2206.07638  [pdf, other

    math.OC cs.DC cs.LG

    Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

    Authors: Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

    Abstract: The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the… ▽ More

    Submitted 20 April, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

  23. arXiv:2206.04613  [pdf, other

    cs.LG stat.ML

    Explicit Regularization in Overparametrized Models via Noise Injection

    Authors: Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach

    Abstract: Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small perturbations can induce explicit regularization for simple models based on the L1-norm, group L1-norms, or nuclear norms. However, when applied to overparametr… ▽ More

    Submitted 22 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted at AISTATS 2023 23 pages

  24. arXiv:2205.15902  [pdf, other

    stat.ML cs.LG math.ST

    Variational inference via Wasserstein gradient flows

    Authors: Marc Lambert, Sinho Chewi, Francis Bach, Silvère Bonnabel, Philippe Rigollet

    Abstract: Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference. Rather than sampling from the true posterior $π$, VI aims at producing a simple but effective approximation $\hat π$ to $π$ for which summary statistics are easy to compute. However, unlike the well-studied MCMC methodology, algorithmic g… ▽ More

    Submitted 21 April, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 52 pages, 15 figures

  25. arXiv:2205.13255  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Active Labeling: Streaming Stochastic Gradients

    Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

    Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 38 pages (9 main pages), 9 figures

    MSC Class: 68T37 ACM Class: G.3

  26. arXiv:2205.13076  [pdf, other

    cs.LG cs.AI math.ST

    On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization

    Authors: Amir Joudaki, Hadi Daneshmand, Francis Bach

    Abstract: Mean field theory is widely used in the theoretical studies of neural networks. In this paper, we analyze the role of depth in the concentration of mean-field predictions, specifically for deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. By scaling the network width to infinity, it is postulated that the mean-field predictions suffer from layer-wise errors that amp… ▽ More

    Submitted 20 February, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  27. arXiv:2205.12751  [pdf, ps, other

    math.OC cs.LG

    Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

    Authors: Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

    Abstract: We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the… ▽ More

    Submitted 12 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  28. arXiv:2204.07879  [pdf, other

    cs.LG stat.ML

    Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design

    Authors: Hadi Daneshmand, Francis Bach

    Abstract: Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity. We argue that the applications of mean-field theory go beyond theoretical insights as it can inspire the design of practical algorithms. Leveraging mean-field analyses in physics, we propose a novel algorithm for sparse measure recovery. For sparse measures over $\mathbb{R}$, w… ▽ More

    Submitted 12 February, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

  29. arXiv:2204.04970  [pdf, other

    cs.LG math.OC

    Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

    Authors: Blake Woodworth, Francis Bach, Alessandro Rudi

    Abstract: We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees, while also providing a posteriori certificates of optimality. Our general formulation builds on i… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  30. arXiv:2202.08545  [pdf, ps, other

    cs.IT cs.LG math.OC stat.ML

    Information Theory with Kernel Methods

    Authors: Francis Bach

    Abstract: We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from v… ▽ More

    Submitted 26 August, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

  31. arXiv:2202.07960  [pdf, other

    cs.LG cs.AI math.AP math.OC

    Temporal Difference Learning with Continuous Time and State in the Stochastic Setting

    Authors: Ziad Kobeissi, Francis Bach

    Abstract: We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical con… ▽ More

    Submitted 7 June, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

  32. arXiv:2202.02831  [pdf, other

    stat.ML cs.LG math.OC

    Anticorrelated Noise Injection for Improved Generalization

    Authors: Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

    Abstract: Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating th… ▽ More

    Submitted 19 May, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 22 pages, 16 figures

  33. arXiv:2201.11980  [pdf, ps, other

    stat.ML cs.LG

    Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

    Authors: Théo Ryffel, Francis Bach, David Pointcheval

    Abstract: We analyse the privacy leakage of noisy stochastic gradient descent by modeling Rényi divergence dynamics with Langevin diffusions. Inspired by recent work on non-stochastic algorithms, we derive similar desirable properties in the stochastic setting. In particular, we prove that the privacy loss converges exponentially fast for smooth and strongly convex objectives under constant step size, which… ▽ More

    Submitted 5 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  34. The openCARP CDE -- Concept for and implementation of a sustainable collaborative development environment for research software

    Authors: Felix Bach, Jochen Klar, Axel Loewe, Jorge Sánchez, Gunnar Seemann, Yung-Lin Huang, Robert Ulrich

    Abstract: This work describes the setup of an advanced technical infrastructure for collaborative software development (CDE) in large, distributed projects based on GitLab. We present its customization and extension, additional features and processes like code review, continuous automated testing, DevOps practices, and sustainable life-cycle management including long-term preservation and citable publishing… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  35. arXiv:2112.01907  [pdf, other

    stat.ML cs.LG math.ST

    Near-optimal estimation of smooth transport maps with kernel sums-of-squares

    Authors: Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

    Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  36. arXiv:2111.11306  [pdf, other

    stat.ML cs.LG

    Learning PSD-valued functions using kernel sums-of-squares

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kern… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

  37. arXiv:2110.15784  [pdf, other

    cs.LG math.OC

    Convergence of Uncertainty Sampling for Active Learning

    Authors: Anant Raj, Francis Bach

    Abstract: Uncertainty sampling in active learning is heavily used in practice to reduce the annotation cost. However, there has been no wide consensus on the function to be used for uncertainty estimation in binary classification tasks and convergence guarantees of the corresponding active learning algorithms are not well understood. The situation is even more challenging for multi-category classification.… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  38. arXiv:2110.10527  [pdf, other

    cs.AI cs.LG math.ST

    Sampling from Arbitrary Functions via PSD Models

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by fi… ▽ More

    Submitted 28 October, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

  39. arXiv:2110.08084  [pdf, other

    cs.LG math.OC math.ST

    Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

    Authors: Francis Bach, Lenaïc Chizat

    Abstract: Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review p… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  40. arXiv:2107.01106  [pdf, ps, other

    math.OC cs.LG

    Screening for a Reweighted Penalized Conditional Gradient Method

    Authors: Yifan Sun, Francis Bach

    Abstract: The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach to collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, repla… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  41. arXiv:2106.09994  [pdf, other

    cs.LG stat.ML

    A Note on Optimizing Distributions using Kernel Mean Embeddings

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low… ▽ More

    Submitted 27 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  42. arXiv:2106.07644  [pdf, other

    math.OC cs.LG cs.MA math.PR stat.ML

    A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

    Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

    Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More

    Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

  43. arXiv:2106.03970  [pdf, other

    stat.ML cs.AI cs.LG

    Batch Normalization Orthogonalizes Representations in Deep Random Networks

    Authors: Hadi Daneshmand, Amir Joudaki, Francis Bach

    Abstract: This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  44. arXiv:2105.15069  [pdf, other

    cs.LG stat.ML

    On the Consistency of Max-Margin Losses

    Authors: Alex Nowak-Vila, Alessandro Rudi, Francis Bach

    Abstract: The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tr… ▽ More

    Submitted 21 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

  45. arXiv:2102.06035  [pdf, other

    cs.DC math.OC

    A Continuized View on Nesterov Acceleration

    Authors: Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

    Abstract: We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process,… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  46. arXiv:2102.02789  [pdf, other

    cs.LG cs.AI stat.ML

    Disambiguation of weak supervision with exponential convergence rates

    Authors: Vivien Cabannes, Francis Bach, Alessandro Rudi

    Abstract: Machine learning approached through supervised learning requires expensive annotation of data. This motivates weakly supervised learning, where data are annotated with incomplete yet discriminative information. In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets. We review a disambiguation principle to rec… ▽ More

    Submitted 15 July, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages; 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6; G.3; F.2.2

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  47. arXiv:2102.00760  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Fast rates in structured prediction

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentia… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 14 main pages, 3 main figures, 43 pages, 4 figures (with appendix)

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: Conference on Learning Theory, PMLR 134, 2021

  48. arXiv:2012.11978  [pdf, ps, other

    math.OC cs.LG stat.ML

    Finding Global Minima via Kernel Approximations

    Authors: Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

    Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that joint… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  49. arXiv:2012.09775  [pdf, other

    stat.ME cs.CR

    Differential privacy and noisy confidentiality concepts for European population statistics

    Authors: Fabian Bach

    Abstract: The paper aims to give an overview of various approaches to statistical disclosure control based on random noise that are currently being discussed for official population statistics and censuses. A particular focus is on a stringent delineation between different concepts influencing the discussion: we separate clearly between risk measures, noise distributions and output mechanisms - putting thes… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: 37 pages, 7 figures, extended abstract accepted for NTTS 2021

  50. arXiv:2010.00892  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Reduced Methods for Machine Learning

    Authors: Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

    Abstract: Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster conver… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 16 pages, 7 figures, 1 table

    MSC Class: 65K05; 68T99 ACM Class: G.1.6