Skip to main content

Showing 1–22 of 22 results for author: Pedregosa, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2212.04025  [pdf, other

    cs.LG cs.AI stat.ML

    A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

    Authors: Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

    Abstract: Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages in main content, 2 pages of bibliography and 5 pages in Appendix

  2. arXiv:2211.04659  [pdf, other

    cs.LG math.OC stat.ML

    When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

    Authors: Junhyung Lyle Kim, Gauthier Gidel, Anastasios Kyrillidis, Fabian Pedregosa

    Abstract: The extragradient method has gained popularity due to its robust convergence properties for differentiable games. Unlike single-objective optimization, game dynamics involve complex interactions reflected by the eigenvalues of the game vector field's Jacobian scattered across the complex plane. This complexity can cause the simple gradient method to diverge, even for bilinear games, while the extr… ▽ More

    Submitted 10 February, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

  3. arXiv:2209.13271  [pdf, other

    math.OC stat.ML

    The Curse of Unrolling: Rate of Differentiating Through Optimization

    Authors: Damien Scieur, Quentin Bertrand, Gauthier Gidel, Fabian Pedregosa

    Abstract: Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. Th… ▽ More

    Submitted 25 August, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  4. arXiv:2105.15183  [pdf, other

    cs.LG math.NA stat.ML

    Efficient and Modular Implicit Differentiation

    Authors: Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

    Abstract: Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems suc… ▽ More

    Submitted 12 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: V3: added more related work and Jacobian precision figure

  5. arXiv:2105.09240  [pdf, other

    cs.LG stat.ML

    Boosting Variational Inference With Locally Adaptive Step-Sizes

    Authors: Gideon Dresdner, Saurav Shekhar, Fabian Pedregosa, Francesco Locatello, Gunnar Rätsch

    Abstract: Variational Inference makes a trade-off between the capacity of the variational family and the tractability of finding an approximate posterior distribution. Instead, Boosting Variational Inference allows practitioners to obtain increasingly good posterior approximations by spending more compute. The main obstacle to widespread adoption of Boosting Variational Inference is the amount of resources… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

  6. arXiv:2102.08868  [pdf, other

    cs.LG cs.CV stat.ML

    Bridging the Gap Between Adversarial Robustness and Optimization Bias

    Authors: Fartash Faghri, Sven Gowal, Cristina Vasconcelos, David J. Fleet, Fabian Pedregosa, Nicolas Le Roux

    Abstract: We demonstrate that the choice of optimizer, neural network architecture, and regularizer significantly affect the adversarial robustness of linear neural networks, providing guarantees without the need for adversarial training. To this end, we revisit a known result linking maximally robust classifiers and minimum norm solutions, and combine it with recent results on the implicit bias of optimize… ▽ More

    Submitted 7 June, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: New CIFAR-10 experiments and Fourier attack variations

  7. arXiv:2102.04396  [pdf, other

    math.OC cs.LG math.PR stat.ML

    SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

    Authors: Courtney Paquette, Kiwon Lee, Fabian Pedregosa, Elliot Paquette

    Abstract: We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and the finite sum setting. Using this new framework, we show that the dynamics of SGD on a least squares problem with random data become deterministic in the large sample and… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  8. arXiv:2006.04299  [pdf, other

    math.OC stat.ML

    Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis

    Authors: Courtney Paquette, Bart van Merriënboer, Elliot Paquette, Fabian Pedregosa

    Abstract: Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a… ▽ More

    Submitted 2 October, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

  9. arXiv:2002.08056  [pdf, other

    cs.LG stat.ML

    The Geometry of Sign Gradient Descent

    Authors: Lukas Balles, Fabian Pedregosa, Nicolas Le Roux

    Abstract: Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected to so-called adaptive gradient methods like Adam. Recent works on signSGD have used a non-standard "separable smoothness" assumption, whereas some old… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

  10. arXiv:1910.05271  [pdf, other

    q-bio.NC cs.LG stat.ML stat.OT

    A Test for Shared Patterns in Cross-modal Brain Activation Analysis

    Authors: Elena Kalinina, Fabian Pedregosa, Vittorio Iacovella, Emanuele Olivetti, Paolo Avesani

    Abstract: Determining the extent to which different cognitive modalities (understood here as the set of cognitive processes underlying the elaboration of a stimulus by the brain) rely on overlap** neural representations is a fundamental issue in cognitive neuroscience. In the last decade, the identification of shared activity patterns has been mostly framed as a supervised learning problem. For instance,… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: 5 figures, tables after References (as required by SciRep template)

  11. arXiv:1906.10732  [pdf, other

    cs.LG cs.CV stat.ML

    The Difficulty of Training Sparse Neural Networks

    Authors: Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

    Abstract: We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the fa… ▽ More

    Submitted 7 October, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: sparse networks, pruning, energy landscape, sparsity

  12. arXiv:1906.07774  [pdf, other

    cs.LG stat.ML

    On the interplay between noise and curvature and its effect on optimization and generalization

    Authors: Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Mangazol, Yoshua Bengio, Nicolas Le Roux

    Abstract: The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and… ▽ More

    Submitted 6 April, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to AISTATS 2020

  13. arXiv:1804.03176  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Splitting via Augmented Lagrangian Method

    Authors: Gauthier Gidel, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets,… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: Appears in: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018). 30 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  14. arXiv:1803.07348  [pdf, ps, other

    math.OC cs.LG stat.ML

    Frank-Wolfe with Subsampling Oracle

    Authors: Thomas Kerdreux, Fabian Pedregosa, Alexandre d'Aspremont

    Abstract: We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small \emph{subset} of the original domain. The first algorithm that we propose is a randomized variant of th… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

  15. arXiv:1801.03749  [pdf, other

    math.OC cs.LG stat.ML

    Improved asynchronous parallel optimization analysis for stochastic incremental methods

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. Unfortunately, conducting the theoretical analysis asynchronous methods is difficult, notably due to the introduction of delay and inconsistency in inherently sequential algorithms. Handling thes… ▽ More

    Submitted 21 March, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 67 pages, published in JMLR, can be found online at http://jmlr.org/papers/v19/17-650.html. arXiv admin note: substantial text overlap with arXiv:1606.04809

  16. arXiv:1707.06468  [pdf, other

    math.OC cs.LG stat.ML

    Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

    Authors: Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien

    Abstract: Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as… ▽ More

    Submitted 5 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

    Journal ref: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  17. arXiv:1610.07830  [pdf, ps, other

    stat.ML math.OC

    On the convergence rate of the three operator splitting scheme

    Authors: Fabian Pedregosa

    Abstract: The three operator splitting scheme was recently proposed by [Davis and Yin, 2015] as a method to optimize composite objective functions with one convex smooth term and two convex (possibly non-smooth) terms for which we have access to their proximity operator. In this short note we provide an alternative proof for the sublinear rate of convergence of this method.

    Submitted 25 June, 2021; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Fixed typo in Lemma 3

  18. arXiv:1606.04809  [pdf, other

    math.OC cs.LG stat.ML

    ASAGA: Asynchronous Parallel SAGA

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduce… ▽ More

    Submitted 8 November, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 37 pages

  19. arXiv:1602.02355  [pdf, other

    stat.ML cs.LG math.OC

    Hyperparameter optimization with approximate gradient

    Authors: Fabian Pedregosa

    Abstract: Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters… ▽ More

    Submitted 21 November, 2022; v1 submitted 7 February, 2016; originally announced February 2016.

    Comments: Fixes error in proof of Theorem 2

  20. arXiv:1412.3919  [pdf, other

    cs.LG cs.CV stat.ML

    Machine Learning for Neuroimaging with Scikit-Learn

    Authors: Alexandre Abraham, Fabian Pedregosa, Michael Eickenberg, Philippe Gervais, Andreas Muller, Jean Kossaifi, Alexandre Gramfort, Bertrand Thirion, Gäel Varoquaux

    Abstract: Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learnin… ▽ More

    Submitted 12 December, 2014; originally announced December 2014.

    Comments: Frontiers in neuroscience, Frontiers Research Foundation, 2013, pp.15

  21. arXiv:1305.2788  [pdf, other

    cs.LG stat.AP

    HRF estimation improves sensitivity of fMRI encoding and decoding models

    Authors: Fabian Pedregosa, Michael Eickenberg, Bertrand Thirion, Alexandre Gramfort

    Abstract: Extracting activation patterns from functional Magnetic Resonance Images (fMRI) datasets remains challenging in rapid-event designs due to the inherent delay of blood oxygen level-dependent (BOLD) signal. The general linear model (GLM) allows to estimate the activation from a design matrix and a fixed hemodynamic response function (HRF). However, the HRF is known to vary substantially between subj… ▽ More

    Submitted 13 May, 2013; originally announced May 2013.

    Comments: 3nd International Workshop on Pattern Recognition in NeuroImaging (2013)

  22. arXiv:1207.3520  [pdf, other

    cs.LG stat.ML

    Improved brain pattern recovery through ranking approaches

    Authors: Fabian Pedregosa, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion, Christophe Pallier, Elodie Cauvet

    Abstract: Inferring the functional specificity of brain regions from functional Magnetic Resonance Images (fMRI) data is a challenging statistical problem. While the General Linear Model (GLM) remains the standard approach for brain map**, supervised learning techniques (a.k.a.} decoding) have proven to be useful to capture multivariate statistical effects distributed across voxels and brain regions. Up t… ▽ More

    Submitted 15 July, 2012; originally announced July 2012.

    Journal ref: Pattern Recognition in NeuroImaging (PRNI 2012) (2012)