Skip to main content

Showing 1–11 of 11 results for author: Grazzi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.11687  [pdf, other

    stat.ML cs.LG math.OC

    Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge b… ▽ More

    Submitted 4 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: ICML 2024. Code at github.com/prolearner/nonsmooth_implicit_diff

  2. arXiv:2402.03170  [pdf, other

    cs.LG

    Is Mamba Capable of In-Context Learning?

    Authors: Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter

    Abstract: State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  3. arXiv:2307.09912  [pdf, other

    cs.LG

    Learning invariant representations of time-homogeneous stochastic dynamical systems

    Authors: Vladimir R. Kostic, Pietro Novelli, Riccardo Grazzi, Karim Lounici, Massimiliano Pontil

    Abstract: We consider the general class of time-homogeneous stochastic dynamical systems, both discrete and continuous, and study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learning the transfer operator or the generator of the system, which in turn can be used for numerous tasks, such as forecasting and interpreting the system dynami… ▽ More

    Submitted 14 March, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  4. arXiv:2206.03150  [pdf, other

    stat.ML cs.LG

    Group Meritocratic Fairness in Linear Contextual Bandits

    Authors: Riccardo Grazzi, Arya Akhavan, John Isak Texas Falk, Leonardo Cella, Massimiliano Pontil

    Abstract: We study the linear contextual bandit problem where an agent has to select one candidate from a pool and each candidate belongs to a sensitive group. In this setting, candidates' rewards may not be directly comparable between groups, for example when the agent is an employer hiring candidates from different ethnic groups and some groups have a lower reward due to discriminatory bias and/or social… ▽ More

    Submitted 20 December, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code for the experiments at https://github.com/CSML-IIT-UCL/GMFbandits

  5. arXiv:2202.03397  [pdf, other

    stat.ML cs.LG math.OC

    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-start

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: We analyse a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have pro… ▽ More

    Submitted 16 November, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Corrected Remark 18 + other small edits. Code at https://github.com/CSML-IIT-UCL/bioptexps

    Journal ref: Journal of Machine Learning Research, volume 24, number 167, pages 1-37, year 2023

  6. arXiv:2111.03418  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Forecasting by combining Global Deep Representations with Local Adaptation

    Authors: Riccardo Grazzi, Valentin Flunkert, David Salinas, Tim Januschowski, Matthias Seeger, Cedric Archambeau

    Abstract: While classical time series forecasting considers individual time series in isolation, recent advances based on deep learning showed that jointly learning from a large pool of related time series can boost the forecasting accuracy. However, the accuracy of these methods suffers greatly when modeling out-of-sample time series, significantly limiting their applicability compared to classical forecas… ▽ More

    Submitted 12 November, 2021; v1 submitted 5 November, 2021; originally announced November 2021.

  7. arXiv:2011.07122  [pdf, other

    stat.ML cs.LG

    Convergence Properties of Stochastic Hypergradients

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important wh… ▽ More

    Submitted 12 April, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: added experiments, a table of notation and some comments. 22 pages

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021), PMLR 130:3826-3834

  8. arXiv:2006.16218  [pdf, other

    stat.ML cs.LG

    On the Iteration Complexity of Hypergradient Computation

    Authors: Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

    Abstract: We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or… ▽ More

    Submitted 10 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: accepted at ICML 2020; 19 pages, 4 figures; code at https://github.com/prolearner/hypertorch (corrected typos and one reference)

  9. arXiv:1903.10399  [pdf, other

    cs.LG stat.ML

    Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

    Authors: Giulia Denevi, Carlo Ciliberto, Riccardo Grazzi, Massimiliano Pontil

    Abstract: We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the square euclidean distance to a bias vector. We present an average excess risk bound for such a learning algorithm. This result quantifies the potential benefit of u… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: 37 pages, 8 figures

  10. arXiv:1806.04941  [pdf, other

    cs.MS cs.LG stat.ML

    Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

    Authors: Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, Paolo Frasconi

    Abstract: In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize l… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

    Comments: This submission is a reduced version of (Franceschi et al., arXiv:1806.04910) which has been accepted at the main ICML 2018 conference. In this paper we illustrate the software framework, material that could not be included in the conference paper

  11. arXiv:1806.04910  [pdf, other

    stat.ML cs.LG

    Bilevel Programming for Hyperparameter Optimization and Meta-Learning

    Authors: Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimilano Pontil

    Abstract: We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised l… ▽ More

    Submitted 3 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: ICML 2018; code for replicating experiments at https://github.com/prolearner/hyper-representation, main package (Far-HO) at https://github.com/lucfra/FAR-HO