Skip to main content

Showing 1–33 of 33 results for author: Rudi, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.12366  [pdf, ps, other

    cs.LG math.ST stat.ML

    Structured Prediction in Online Learning

    Authors: Pierre Boudart, Alessandro Rudi, Pierre Gaillard

    Abstract: We study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages

  2. arXiv:2401.07734  [pdf, ps, other

    math.OC

    Solving moment and polynomial optimization problems on Sobolev spaces

    Authors: Didier Henrion, Alessandro Rudi

    Abstract: Using standard tools of harmonic analysis, we state and solve the problem of moments for positive measures supported on the unit ball of a Sobolev space of multivariate periodic trigonometric functions. We describe outer and inner semidefinite approximations of the cone of Sobolev moments. They are the basic components of an infinite-dimensional moment-sums of squares hierarchy, allowing to solve… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2306.14932  [pdf, ps, other

    math.OC cs.LG

    GloptiNets: Scalable Non-Convex Optimization with Certificates

    Authors: Gaspard Beugnot, Julien Mairal, Alessandro Rudi

    Abstract: We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus. Unlike traditional methods that rely on algebraic properties, our algorithm exploits the regularity of the target function intrinsic in the decay of its Fourier spectrum. By defining a tractable family of models, we allow at the same time to obtain precise cert… ▽ More

    Submitted 20 December, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: Edit affiliations and acknowledgments

  4. arXiv:2305.15557  [pdf, ps, other

    cs.LG eess.SY math.OC

    Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence

    Authors: Riccardo Bonalli, Alessandro Rudi

    Abstract: We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of multi-dimensional non-linear stochastic differential equations, which relies upon discrete-time observations of the state. The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations, yielding theoretical esti… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  5. arXiv:2301.06339  [pdf, other

    math.OC cs.LG

    Approximation of optimization problems with constraints through kernel Sum-Of-Squares

    Authors: Pierre-Cyril Aubin-Frankowski, Alessandro Rudi

    Abstract: Handling an infinite number of inequality constraints in infinite-dimensional spaces occurs in many fields, from global optimization to optimal transport. These problems have been tackled individually in several previous articles through kernel Sum-Of-Squares (kSoS) approximations. We propose here a unified theorem to prove convergence guarantees for these schemes. Pointwise inequalities are turne… ▽ More

    Submitted 21 February, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

    MSC Class: 46E22; 46N10; 90C26

  6. arXiv:2211.04889  [pdf, other

    math.OC

    Exponential convergence of sum-of-squares hierarchies for trigonometric polynomials

    Authors: Francis Bach, Alessandro Rudi

    Abstract: We consider the unconstrained optimization of multivariate trigonometric polynomials by the sum-of-squares hierarchy of lower bounds. We first show a convergence rate of $O(1/s^2)$ for the relaxation with degree $s$ without any assumption on the trigonometric polynomial to minimize. Second, when the polynomial has a finite number of global minimizers with invertible Hessians at these minimizers, w… ▽ More

    Submitted 18 April, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Journal ref: SIAM Journal on Optimization, In press

  7. arXiv:2204.04970  [pdf, other

    cs.LG math.OC

    Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

    Authors: Blake Woodworth, Francis Bach, Alessandro Rudi

    Abstract: We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees, while also providing a posteriori certificates of optimality. Our general formulation builds on i… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  8. arXiv:2202.13733  [pdf, other

    stat.ML cs.LG math.OC

    On the Benefits of Large Learning Rates for Kernel Methods

    Authors: Gaspard Beugnot, Julien Mairal, Alessandro Rudi

    Abstract: This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consid… ▽ More

    Submitted 3 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Accepted paper at Conference COLT 2022. To be published to Proceedings of Machine Learning Research (PMLR)

  9. arXiv:2202.13729  [pdf, other

    math.OC

    Second order conditions to decompose smooth functions as sums of squares

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: We consider the problem of decomposing a regular non-negative function as a sum of squares of functions which preserve some form of regularity. In the same way as decomposing non-negative polynomials as sum of squares of polynomials allows to derive methods in order to solve global optimization problems on polynomials, decomposing a regular function as a sum of squares allows to derive methods to… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  10. arXiv:2112.01907  [pdf, other

    stat.ML cs.LG math.ST

    Near-optimal estimation of smooth transport maps with kernel sums-of-squares

    Authors: Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

    Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  11. arXiv:2110.10527  [pdf, other

    cs.AI cs.LG math.ST

    Sampling from Arbitrary Functions via PSD Models

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: In many areas of applied statistics and machine learning, generating an arbitrary number of independent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implementations. Instead, we take a two-step approach by fi… ▽ More

    Submitted 28 October, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

  12. arXiv:2110.07396  [pdf, other

    math.OC

    Infinite-Dimensional Sums-of-Squares for Optimal Control

    Authors: Eloïse Berthier, Justin Carpentier, Alessandro Rudi, Francis Bach

    Abstract: We introduce an approximation method to solve an optimal control problem via the Lagrange dual of its weak formulation. It is based on a sum-of-squares representation of the Hamiltonian, and extends a previous method from polynomial optimization to the generic case of smooth problems. Such a representation is infinite-dimensional and relies on a particular space of functions-a reproducing kernel H… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  13. arXiv:2110.03960  [pdf, other

    cs.LG math.ST stat.ML

    Mixability made efficient: Fast online multiclass logistic regression

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  14. arXiv:2106.16116  [pdf, ps, other

    cs.LG math.ST stat.ML

    PSD Representations for Effective Probability Models

    Authors: Alessandro Rudi, Carlo Ciliberto

    Abstract: Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive… ▽ More

    Submitted 24 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: 50 pages, 1 table

  15. arXiv:2102.03594  [pdf, other

    math.ST cs.LG stat.ML

    Online nonparametric regression with Sobolev kernels

    Authors: Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi

    Abstract: In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^β(\mathcal{X})$, $p\geq 2, β>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $β> \frac{d}{2}$ or… ▽ More

    Submitted 13 July, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

    Comments: 40 pages, 5 figures, 3 tables (version 2)

  16. arXiv:2102.00760  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Fast rates in structured prediction

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentia… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 14 main pages, 3 main figures, 43 pages, 4 figures (with appendix)

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: Conference on Learning Theory, PMLR 134, 2021

  17. arXiv:2101.05380  [pdf, other

    math.ST math.OC

    A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

    Authors: Adrien Vacher, Boris Muzellec, Alessandro Rudi, Francis Bach, Francois-Xavier Vialard

    Abstract: It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexity of these recently proposed methods still degrades exponentially with the dimension. In this paper, thanks to an infinite-dimensional sum-of-squares representation… ▽ More

    Submitted 1 October, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: 30 pages

    MSC Class: 62G05

  18. arXiv:2012.11978  [pdf, ps, other

    math.OC cs.LG stat.ML

    Finding Global Minima via Kernel Approximations

    Authors: Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

    Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that joint… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  19. arXiv:2007.03926  [pdf, other

    cs.LG cs.AI math.ST

    Non-parametric Models for Non-negative Functions

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: Linear models have shown great effectiveness and flexibility in many fields such as machine learning, signal processing and statistics. They can represent rich spaces of functions while preserving the convexity of the optimization problems where they are used, and are simple to evaluate, differentiate and integrate. However, for modeling non-negative functions, which are crucial for unsupervised… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  20. arXiv:2006.09984   

    stat.ML cs.LG math.NA

    Interpolation and Learning with Scale Dependent Kernels

    Authors: Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco

    Abstract: We study the learning properties of nonparametric ridge-less least squares. In particular, we consider the common case of estimators defined by scale dependent kernels, and focus on the role of the scale. These estimators interpolate the data and the scale can be shown to control their stability through the condition number. Our analysis shows that are different regimes depending on the interplay… ▽ More

    Submitted 10 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: The paper is not completed and contains parts which need to be modified

  21. arXiv:2003.08109  [pdf, other

    cs.LG math.ST stat.ML

    Efficient improper learning for online logistic regression

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential c… ▽ More

    Submitted 3 November, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Journal ref: Conference on Learning Theory 2020, Jul 2020, Graz, Austria

  22. arXiv:2002.05424  [pdf, ps, other

    stat.ML cs.LG math.ST

    A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings

    Authors: Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi

    Abstract: We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss function… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 53 pages

  23. arXiv:1910.14564  [pdf, other

    math.PR math.OC

    Statistical Estimation of the Poincar{é} constant and Application to Sampling Multimodal Distributions

    Authors: Loucas Pillaud-Vivien, Francis Bach, Tony Lelièvre, Alessandro Rudi, Gabriel Stoltz

    Abstract: Poincar{é} inequalities are ubiquitous in probability and analysis and have various applications in statistics (concentration of measure, rate of convergence of Markov chains). The Poincar{é} constant, for which the inequality is tight, is related to the typical convergence rate of diffusions to their equilibrium measure. In this paper, we show both theoretically and experimentally that, given suf… ▽ More

    Submitted 22 November, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

  24. arXiv:1907.05226  [pdf, other

    stat.ML cs.LG math.ST

    Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

    Authors: Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi

    Abstract: In this paper, we propose and study a Nyström based approach to efficient large scale kernel principal component analysis (PCA). The latter is a natural nonlinear extension of classical PCA based on considering a nonlinear feature map or the corresponding kernel. Like other kernel approaches, kernel PCA enjoys good mathematical and statistical properties but, numerically, it scales poorly with the… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: 19 pages, 2 figures

    MSC Class: 62H25; 62H12; 46E22

  25. arXiv:1907.01771  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a c… ▽ More

    Submitted 21 November, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

    Journal ref: NeurIPS 2019 - Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada

  26. arXiv:1902.09917  [pdf, other

    stat.ML cs.LG math.ST

    Efficient online learning with kernels for adversarial large scale problems

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order… ▽ More

    Submitted 29 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  27. arXiv:1902.03086  [pdf, ps, other

    math.ST math.PR stat.ML

    Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

    Authors: Dmitrii Ostrovskii, Alessandro Rudi

    Abstract: In this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distributionWe prove that the proposed estimator $\widehat{\mathbf{S}}$ admits an \textit{affine-invariant} bound of the form \[(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}\]in high probability, where $\mathbf{S}$ is the unknown covariance matrix, an… ▽ More

    Submitted 24 September, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

    Journal ref: 32nd Annual Conference on Learning Theory (COLT), 2019, Jun 2019, Phoenix, United States

  28. arXiv:1902.03046  [pdf, ps, other

    cs.LG cs.AI math.ST

    Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance

    Authors: Ulysse Marteau-Ferey, Dmitrii Ostrovskii, Francis Bach, Alessandro Rudi

    Abstract: We consider learning methods based on the regularization of a convex empirical risk by a squared Hilbertian norm, a setting that includes linear predictors and non-linear predictors through positive-definite kernels. In order to go beyond the generic analysis leading to convergence rates of the excess risk as $O(1/\sqrt{n})$ from $n$ observations, we assume that the individual losses are self-conc… ▽ More

    Submitted 18 June, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

  29. arXiv:1812.05189  [pdf, other

    stat.ML cs.DS cs.LG math.OC

    Massively scalable Sinkhorn distances via the Nyström method

    Authors: Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

    Abstract: The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference. However, the time and memory requirements of standard algorithms for computing this distance grow quadratically with the size of the data, making them prohibitively expensive on massive data sets. In this work, we show that this… ▽ More

    Submitted 26 October, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: to appear in NeurIPS 2019

    Journal ref: Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

  30. arXiv:1810.10046  [pdf, ps, other

    cs.DS math.OC

    Approximating the Quadratic Transportation Metric in Near-Linear Time

    Authors: Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Weed

    Abstract: Computing the quadratic transportation metric (also called the $2$-Wasserstein distance or root mean square distance) between two point clouds, or, more generally, two discrete distributions, is a fundamental problem in machine learning, statistics, computer graphics, and theoretical computer science. A long line of work has culminated in a sophisticated geometric algorithm due to Agarwal and Shar… ▽ More

    Submitted 16 December, 2018; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: unchanged from v1; this article now superseded by arXiv:1812.05189

  31. arXiv:1810.06839  [pdf, ps, other

    cs.LG cs.AI cs.CC math.ST stat.ML

    Sharp Analysis of Learning with Discrete Losses

    Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi

    Abstract: The problem of devising learning strategies for discrete losses (e.g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss. In this paper we study a least-squares framework to systematically design learning algorithms for discrete losses, with quantitative characterizations in terms of statistical and computational complexity. In particular we… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  32. arXiv:1805.10074  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

    Authors: Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

    Abstract: We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, w… ▽ More

    Submitted 23 November, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Journal ref: Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada. 2018

  33. arXiv:1801.06720  [pdf, ps, other

    stat.ML cs.LG math.FA

    Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

    Authors: Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher

    Abstract: In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms f… ▽ More

    Submitted 15 July, 2022; v1 submitted 20 January, 2018; originally announced January 2018.

    Comments: Updating acknowledgments; Journal version

    Journal ref: Applied and Computational Harmonic Analysis 48 (2020) 868-890