Skip to main content

Showing 1–5 of 5 results for author: Gargiani, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.00308  [pdf, other

    cs.LG math.OC

    PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

    Authors: Matilde Gargiani, Andrea Zanelli, Andrea Martinelli, Tyler Summers, John Lygeros

    Abstract: Despite their success, policy gradient methods suffer from high variance of the gradient estimate, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type m… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

  2. arXiv:2011.10298  [pdf, other

    cs.LG math.OC

    Convergence Analysis of Homotopy-SGD for non-convex optimization

    Authors: Matilde Gargiani, Andrea Zanelli, Quoc Tran-Dinh, Moritz Diehl, Frank Hutter

    Abstract: First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning models. Their inexpensive iterations generally come together with slow global convergence rate (mostly sublinear), leading to the necessity of carrying out a ver… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: 21 pages, 14 figures, technical report

  3. arXiv:2006.02409  [pdf, other

    cs.LG stat.ML

    On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs

    Authors: Matilde Gargiani, Andrea Zanelli, Moritz Diehl, Frank Hutter

    Abstract: Following early work on Hessian-free methods for deep learning, we study a stochastic generalized Gauss-Newton method (SGN) for training DNNs. SGN is a second-order optimization method, with efficient iterations, that we demonstrate to often require substantially fewer iterations than standard SGD to converge. As the name suggests, SGN uses a Gauss-Newton approximation for the Hessian matrix, and,… ▽ More

    Submitted 9 June, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

  4. arXiv:1910.04522  [pdf, other

    cs.LG stat.ML

    Probabilistic Rollouts for Learning Curve Extrapolation Across Hyperparameter Settings

    Authors: Matilde Gargiani, Aaron Klein, Stefan Falkner, Frank Hutter

    Abstract: We propose probabilistic models that can extrapolate learning curves of iterative machine learning algorithms, such as stochastic gradient descent for training deep networks, based on training data with variable-length learning curves. We study instantiations of this framework based on random forests and Bayesian recurrent neural networks. Our experiments show that these models yield better predic… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

  5. arXiv:1806.07569  [pdf, other

    cs.LG stat.ML

    A Distributed Second-Order Algorithm You Can Trust

    Authors: Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

    Abstract: Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the compu… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: appearing at ICML 2018 - Proceedings of the 35th International Conference on Machine Learning, Stockholm, Schweden, PMLR 80, 2018