Skip to main content

Showing 1–23 of 23 results for author: Orabona, F

Searching in archive math. Search in all archives.
.
  1. arXiv:2406.01577  [pdf, ps, other

    cs.LG math.OC stat.ML

    An Equivalence Between Static and Dynamic Regret Minimization

    Authors: Andrew Jacobsen, Francesco Orabona

    Abstract: We study the problem of dynamic regret minimization in online convex optimization, in which the objective is to minimize the difference between the cumulative loss of an algorithm and that of an arbitrary sequence of comparators. While the literature on this topic is very rich, a unifying framework for the analysis and design of these algorithms is still missing. In this paper, \emph{we show that… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 26 pages

  2. arXiv:2308.05621  [pdf, ps, other

    cs.LG math.OC stat.ML

    Normalized Gradients for All

    Authors: Francesco Orabona

    Abstract: In this short note, I show how to adapt to Hölder smoothness using normalized gradients in a black-box way. Moreover, the bound will depend on a novel notion of local Hölder smoothness. The main idea directly comes from Levy [2017].

    Submitted 10 August, 2023; originally announced August 2023.

  3. arXiv:2306.00201  [pdf, other

    cs.LG math.OC stat.ML

    Generalized Implicit Follow-The-Regularized-Leader

    Authors: Keyi Chen, Francesco Orabona

    Abstract: We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework. Generalized implicit FTRL can recover known algorithms, as FTRL with linearized losses and implicit FTRL, and it allows the design of new update rules, as extensions of aProx and Mirror-Prox to FTRL. Our theory is constructive in the sense that… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  4. arXiv:2302.03775  [pdf, ps, other

    cs.LG math.OC stat.ML

    Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

    Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona

    Abstract: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(δ,ε)$-stationary point from $O(ε^{-4}δ^{-1})$ stochastic gradient queries to $O(ε^{-3}δ^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to onl… ▽ More

    Submitted 11 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  5. arXiv:2208.11195  [pdf, other

    cs.LG math.OC

    Robustness to Unbounded Smoothness of Generalized SignSGD

    Authors: Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei Zhang, Zhenxun Zhuang

    Abstract: Traditional analyses in non-convex optimization typically rely on the smoothness assumption, namely requiring the gradients to be Lipschitz. However, recent evidence shows that this smoothness condition does not capture the properties of some deep learning objective functions, including the ones involving Recurrent Neural Networks and LSTMs. Instead, they satisfy a much more relaxed condition, wit… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  6. arXiv:2202.00089  [pdf, other

    cs.LG math.OC

    Understanding AdamW through Proximal Methods and Scale-Freeness

    Authors: Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

    Abstract: Adam has been widely adopted for training deep neural networks due to less hyperparameter tuning and remarkable performance. To improve generalization, Adam is typically used in tandem with a squared $\ell_2$ regularizer (referred to as Adam-$\ell_2$). However, even better performance can be obtained with AdamW, which decouples the gradient of the regularizer from the update rule of Adam-$\ell_2$.… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  7. arXiv:2110.14099  [pdf, other

    stat.ML cs.IT cs.LG math.ST stat.ME

    Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio

    Authors: Francesco Orabona, Kwang-Sung Jun

    Abstract: A classic problem in statistics is the estimation of the expectation of random variables from samples. This gives rise to the tightly connected problems of deriving concentration inequalities and confidence sequences, that is confidence intervals that hold uniformly over time. Previous work has shown how to easily convert the regret guarantee of an online betting algorithm into a time-uniform conc… ▽ More

    Submitted 31 July, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  8. arXiv:2103.00284  [pdf, other

    math.OC cs.LG

    On the Initialization for Convex-Concave Min-max Problems

    Authors: Mingrui Liu, Francesco Orabona

    Abstract: Convex-concave min-max problems are ubiquitous in machine learning, and people usually utilize first-order methods (e.g., gradient descent ascent) to find the optimal solution. One feature which separates convex-concave min-max problems from convex minimization problems is that the best known convergence rates for min-max problems have an explicit dependence on the size of the domain, rather than… ▽ More

    Submitted 7 March, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: Accepted by ALT 2022

  9. arXiv:2102.07002  [pdf, other

    cs.LG math.OC stat.ML

    On the Last Iterate Convergence of Momentum Methods

    Authors: Xiaoyu Li, Mingrui Liu, Francesco Orabona

    Abstract: SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely use… ▽ More

    Submitted 24 July, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Differences with ALT'22 camera ready: Clarified the statement of the lower bound

  10. arXiv:2102.00236  [pdf, ps, other

    math.OC cs.LG stat.ML

    Parameter-free Stochastic Optimization of Variationally Coherent Functions

    Authors: Francesco Orabona, Dávid Pál

    Abstract: We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$. In particular, we consider the \emph{variationally coherent} functions which can be convex or non-convex. The iterates of our algorithm on variationally coherent functions converge almost surely to the global minimizer $\boldsymbol{x}^*$. Additionally, the very same algorithm… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

  11. arXiv:2011.11985  [pdf, ps, other

    cs.LG math.OC

    Adam$^+$: A Stochastic Method with Adaptive Variance Reduction

    Authors: Mingrui Liu, Wei Zhang, Francesco Orabona, Tianbao Yang

    Abstract: Adam is a widely used stochastic optimization method for deep learning applications. While practitioners prefer Adam because it requires less parameter tuning, its use is problematic from a theoretical point of view since it may not converge. Variants of Adam have been proposed with provable convergence guarantee, but they tend not be competitive with Adam on the practical performance. In this pap… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  12. arXiv:1912.13213  [pdf, other

    cs.LG math.OC stat.ML

    A Modern Introduction to Online Learning

    Authors: Francesco Orabona

    Abstract: In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as inst… ▽ More

    Submitted 27 May, 2023; v1 submitted 31 December, 2019; originally announced December 2019.

    Comments: Major update: Two new chapters (saddle-point optimization and universal portfolio); added missing proofs; fixed issues due to empty interior simplex; fixed a lot of typos

  13. arXiv:1911.09564  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Parameter-Free Locally Differentially Private Stochastic Subgradient Descent

    Authors: Kwang-Sung Jun, Francesco Orabona

    Abstract: We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing $ε$-locally differentially private ($ε$-LDP). While it has been shown that stochastic optimization is possible with $ε$-LDP via the standard SGD (Song et al., 2013), its convergence rate largely depends on the learning rate, which must be tuned via repeated runs. Further, tuning is detrimental to privacy… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: to appear at Privacy in Machine Learning (PriML) workshop, NeurIPS'19

  14. arXiv:1905.10018  [pdf, other

    cs.LG math.OC stat.ML

    Momentum-Based Variance Reduction in Non-Convex SGD

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-bat… ▽ More

    Submitted 21 April, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added Ack

  15. arXiv:1902.01500  [pdf, other

    cs.LG math.OC stat.ML

    Parameter-Free Online Convex Optimization with Sub-Exponential Noise

    Authors: Kwang-Sung Jun, Francesco Orabona

    Abstract: We consider the problem of unconstrained online convex optimization (OCO) with sub-exponential noise, a strictly more general problem than the standard OCO. In this setting, the learner receives a subgradient of the loss functions corrupted by sub-exponential noise and strives to achieve optimal regret guarantee, without knowledge of the competitor norm, i.e., in a parameter-free way. Recently, Cu… ▽ More

    Submitted 20 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: v1: Accepted to COLT'19, v2: adjusted Theorem 3, w_t closed form solution, and typos

  16. arXiv:1901.09068  [pdf, other

    cs.LG math.OC stat.ML

    Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

    Authors: Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

    Abstract: Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in pract… ▽ More

    Submitted 7 June, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  17. arXiv:1805.08114  [pdf, ps, other

    stat.ML cs.LG math.OC

    On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

    Authors: Xiaoyu Li, Francesco Orabona

    Abstract: Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex sett… ▽ More

    Submitted 26 February, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: More discussion on related work

  18. arXiv:1802.06293  [pdf, ps, other

    cs.LG math.OC stat.ML

    Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce o… ▽ More

    Submitted 25 June, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

    Comments: Appears in Conference on Learning Theory 2018

  19. arXiv:1705.07795  [pdf, other

    cs.LG math.OC stat.ML

    Training Deep Networks without Learning Rates Through Coin Betting

    Authors: Francesco Orabona, Tatiana Tommasi

    Abstract: Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep n… ▽ More

    Submitted 4 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Camera-ready version for NIPS 2017

  20. arXiv:1502.05744  [pdf, ps, other

    cs.LG math.OC

    Scale-Free Algorithms for Online Linear Optimization

    Authors: Francesco Orabona, David Pal

    Abstract: We design algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors. We achieve adaptiveness to norms of loss vectors by scale invariance, i.e., our algorithms make exactly the same decisions if the sequence of loss vectors is multiplied by any positive constant. Our algorithms work for any… ▽ More

    Submitted 1 July, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  21. arXiv:1502.01632  [pdf, ps, other

    cs.LG math.PR

    A Simple Expression for Mill's Ratio of the Student's $t$-Distribution

    Authors: Francesco Orabona

    Abstract: I show a simple expression of the Mill's ratio of the Student's t-Distribution. I use it to prove Conjecture 1 in P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2-3):235--256, May 2002.

    Submitted 5 February, 2015; originally announced February 2015.

  22. arXiv:1310.4227  [pdf, other

    cs.LG math.PR

    On Measure Concentration of Random Maximum A-Posteriori Perturbations

    Authors: Francesco Orabona, Tamir Hazan, Anand D. Sarwate, Tommi Jaakkola

    Abstract: The maximum a-posteriori (MAP) perturbation framework has emerged as a useful approach for inference and learning in high dimensional complex models. By maximizing a randomly perturbed potential function, MAP perturbations generate unbiased samples from the Gibbs distribution. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. More eff… ▽ More

    Submitted 15 October, 2013; originally announced October 2013.

  23. arXiv:1206.2372  [pdf, other

    math.OC cs.LG

    PRISMA: PRoximal Iterative SMoothing Algorithm

    Authors: Francesco Orabona, Andreas Argyriou, Nathan Srebro

    Abstract: Motivated by learning problems including max-norm regularized matrix completion and clustering, robust PCA and sparse inverse covariance selection, we propose a novel optimization algorithm for minimizing a convex objective which decomposes into three parts: a smooth part, a simple non-smooth Lipschitz part, and a simple non-smooth non-Lipschitz part. We use a time variant smoothing strategy that… ▽ More

    Submitted 18 November, 2012; v1 submitted 11 June, 2012; originally announced June 2012.