Skip to main content

Showing 1–50 of 65 results for author: Lacoste-Julien, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2401.04890  [pdf, other

    stat.ML cs.LG

    Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies

    Authors: Sébastien Lachapelle, Pau Rodríguez López, Yash Sharma, Katie Everett, Rémi Le Priol, Alexandre Lacoste, Simon Lacoste-Julien

    Abstract: This work introduces a novel principle for disentanglement we call mechanism sparsity regularization, which applies when the latent factors of interest depend sparsely on observed auxiliary variables and/or past latent factors. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that explains t… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 88 pages

    ACM Class: I.2.6; I.5.1

  2. arXiv:2311.03096  [pdf, other

    cs.LG stat.ML

    Weight-Sharing Regularization

    Authors: Mehran Shakerinava, Motahareh Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien

    Abstract: Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal map** of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We… ▽ More

    Submitted 10 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Our code is available at https://github.com/motahareh-sohrabi/weight-sharing-regularization

  3. arXiv:2307.02598  [pdf, other

    cs.LG stat.ML

    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

    Authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 37 (NeurIPS 2023). 39 pages

    ACM Class: I.2.6; I.5.1

  4. arXiv:2303.04143  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

    Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

    Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for i… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

  5. arXiv:2211.14666  [pdf, other

    cs.LG stat.ML

    Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

    Authors: Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

    Abstract: Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maxima… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: Appears in: Fortieth International Conference on Machine Learning (ICML 2023). 36 pages

    ACM Class: I.2.6; I.5.1

  6. arXiv:2207.07732  [pdf, other

    stat.ML cs.LG

    Partial Disentanglement via Mechanism Sparsity

    Authors: Sébastien Lachapelle, Simon Lacoste-Julien

    Abstract: Disentanglement via mechanism sparsity was introduced recently as a principled approach to extract latent factors without supervision when the causal graph relating them in time is sparse, and/or when actions are observed and affect them sparsely. However, this theory applies only to ground-truth graphs satisfying a specific criterion. In this work, we introduce a generalization of this theory whi… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Appears in: The First Workshop on Causal Representation Learning (CRL 2022) at UAI. 26 pages

  7. arXiv:2202.13903  [pdf, other

    cs.LG stat.ML

    Bayesian Structure Learning with Generative Flow Networks

    Authors: Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

    Abstract: In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets… ▽ More

    Submitted 28 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

  8. arXiv:2111.12193  [pdf, other

    cs.LG stat.ML

    Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

    Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equ… ▽ More

    Submitted 3 February, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Published at International Conference on Learning Representations (ICLR) 2022

  9. arXiv:2111.06826  [pdf, other

    stat.ML cs.LG math.ST

    Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

    Authors: Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

    Abstract: We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime.… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 9 pages and 3 figures + Appendix

  10. arXiv:2107.10098  [pdf, other

    stat.ML cs.LG

    Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

    Authors: Sébastien Lachapelle, Pau Rodríguez López, Yash Sharma, Katie Everett, Rémi Le Priol, Alexandre Lacoste, Simon Lacoste-Julien

    Abstract: This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent factors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that rel… ▽ More

    Submitted 23 February, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Appears in: 1st Conference on Causal Learning and Reasoning (CLeaR 2022). 57 pages

    ACM Class: I.2.6; I.5.1

  11. arXiv:2107.00052  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

    Authors: Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used success… ▽ More

    Submitted 4 November, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  12. arXiv:2102.09645  [pdf, other

    cs.LG math.OC stat.ML

    SVRG Meets AdaGrad: Painless Variance Reduction

    Authors: Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  13. arXiv:2011.11150  [pdf, other

    cs.LG stat.ML

    On the Convergence of Continuous Constrained Optimization for Structure Learning

    Authors: Ignavier Ng, Sébastien Lachapelle, Nan Rosemary Ke, Simon Lacoste-Julien, Kun Zhang

    Abstract: Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty c… ▽ More

    Submitted 10 April, 2022; v1 submitted 22 November, 2020; originally announced November 2020.

    Comments: AISTATS 2022. A preliminary version of this paper was presented at the NeurIPS 2020 Workshop on Causal Discovery and Causality-Inspired Machine Learning. The code is available at https://github.com/ignavierng/notears-convergence

  14. arXiv:2009.12501  [pdf, other

    cs.LG math.OC stat.ML

    Flight-connection Prediction for Airline Crew Scheduling to Construct Initial Clusters for OR Optimizer

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: We present a case study of using machine learning classification algorithms to initialize a large-scale commercial solver (GENCOL) based on column generation in the context of the airline crew pairing problem, where small savings of as little as 1% translate to increasing annual revenue by dozens of millions of dollars in a large airline. Under the imitation learning framework, we focus on the pro… ▽ More

    Submitted 2 March, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

    Comments: First publication on the "Cahiers du GERAD" series in April 2019

    Report number: G-2019-26

  15. arXiv:2008.00938  [pdf, other

    cs.LG stat.ML

    Implicit Regularization via Neural Feature Alignment

    Authors: Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

    Abstract: We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. This can be interpreted as a combined mechanism of feature selection and compression. By extrapolating a new analysis of Rad… ▽ More

    Submitted 16 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: AISTATS 2021

  16. arXiv:2007.04202  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Hamiltonian Gradient Methods for Smooth Games

    Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using t… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning

  17. arXiv:2007.01754  [pdf, other

    cs.LG stat.ML

    Differentiable Causal Discovery from Interventional Data

    Authors: Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, Alexandre Drouin

    Abstract: Learning a causal directed acyclic graph from data is a challenging task that involves solving a combinatorial problem for which the solution is not always identifiable. A new line of work reformulates this problem as a continuous constrained optimization one, which is solved via the augmented Lagrangian method. However, most methods based on this idea do not make use of interventional data, which… ▽ More

    Submitted 3 November, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Appears in: Advances in Neural Information Processing Systems 34 (NeurIPS 2020). 46 pages

    ACM Class: I.2.6; I.5.1

  18. arXiv:2007.00720  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Adversarial Example Games

    Authors: Avishek Joey Bose, Gauthier Gidel, Hugo Berard, Andre Cianflone, Pascal Vincent, Simon Lacoste-Julien, William L. Hamilton

    Abstract: The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks to guide the development of safeguards against them. This includes attack methods in the challenging non-interactive blackbox setting, where adversarial attacks are generated without any access, including queries, to the target model. Prior attacks i… ▽ More

    Submitted 8 January, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Appears in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  19. arXiv:2006.06835  [pdf, other

    cs.LG math.OC stat.ML

    Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

    Authors: Sharan Vaswani, Issam Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only ap… ▽ More

    Submitted 18 February, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  20. arXiv:2006.06821  [pdf, other

    cs.LG stat.ML

    To Each Optimizer a Norm, To Each Norm its Generalization

    Authors: Sharan Vaswani, Reza Babanezhad, Jose Gallego-Posada, Aaron Mishkin, Simon Lacoste-Julien, Nicolas Le Roux

    Abstract: We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoni… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  21. arXiv:2005.09136  [pdf, other

    stat.ML cs.LG

    An Analysis of the Adaptation Speed of Causal Models

    Authors: Rémi Le Priol, Reza Babanezhad Harikandeh, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: Consider a collection of datasets generated by unknown interventions on an unknown structural causal model $G$. Recently, Bengio et al. (2020) conjectured that among all candidate models, $G$ is the fastest to adapt from one dataset to another, along with promising experiments. Indeed, intuitively $G$ has less mechanisms to adapt, but this justification is incomplete. Our contribution is a more th… ▽ More

    Submitted 25 February, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Published at AISTATS 2021. 10 pages main articles, 19 pages supplement, 10 figures

  22. arXiv:2002.10542  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

    Authors: Nicolas Loizou, Sharan Vaswani, Issam Laradji, Simon Lacoste-Julien

    Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting th… ▽ More

    Submitted 22 March, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  23. arXiv:2001.00602  [pdf, other

    cs.LG math.OC stat.ML

    Accelerating Smooth Games by Manipulating Spectral Shapes

    Authors: Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges… ▽ More

    Submitted 9 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 34 pages

    MSC Class: G.1.6; I.2.6 ACM Class: G.1.6; I.2.6

  24. arXiv:1910.04920  [pdf, other

    cs.LG math.OC stat.ML

    Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

    Authors: Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien

    Abstract: We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient… ▽ More

    Submitted 22 March, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: AISTATS, 2020

  25. arXiv:1906.08325  [pdf, other

    cs.LG stat.ML

    GAIT: A Geometric Approach to Information Theory

    Authors: Jose Gallego-Posada, Ankit Vani, Max Schwarzer, Simon Lacoste-Julien

    Abstract: We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our pr… ▽ More

    Submitted 13 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020. 19 pages

    Journal ref: PMLR (2020) 108:2601-2611

  26. arXiv:1906.05945  [pdf, other

    cs.LG math.OC stat.ML

    A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Games

    Authors: Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using variants of the gradient method (GD). Prime examples are extragradient (EG), the optimistic gradient method (OG) and consensus optimization (CO), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full benefits of theses… ▽ More

    Submitted 7 July, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 39 pages. Minor modification regarding prior work in comparison to the AISTATS Proceedings

    ACM Class: G.1.6; I.2.6

  27. arXiv:1906.04848  [pdf, other

    cs.LG stat.ML

    A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

    Authors: Hugo Berard, Gauthier Gidel, Amjad Almahairi, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Generative adversarial networks have been very successful in generative modeling, however they remain relatively challenging to train compared to standard deep neural networks. In this paper, we propose new visualization techniques for the optimization landscapes of GANs that enable us to study the game vector field resulting from the concatenation of the gradient of both players. Using these visu… ▽ More

    Submitted 27 April, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

  28. arXiv:1906.02226  [pdf, other

    cs.LG stat.ML

    Gradient-Based Neural DAG Learning

    Authors: Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, Simon Lacoste-Julien

    Abstract: We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while avoiding the combinatorial nature of the problem. In addition to comparing our… ▽ More

    Submitted 18 February, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020). 23 pages

    ACM Class: I.2.6; I.5.1

  29. arXiv:1905.09997  [pdf, other

    cs.LG math.OC stat.ML

    Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

    Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

    Abstract: Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t… ▽ More

    Submitted 4 June, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added a citation to the related work of Paul Tseng, and citations to methods that had previously explored line-searches for deep learning empirically

  30. arXiv:1904.13262  [pdf, other

    cs.LG math.OC stat.ML

    Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

    Authors: Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

    Abstract: When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the train… ▽ More

    Submitted 5 December, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: 19 pages, to appear in NeurIPS 2019 proceedings

  31. arXiv:1904.08598  [pdf, other

    stat.ML cs.LG math.OC

    Reducing Noise in GAN Training with Variance Reduced Extragradient

    Authors: Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien

    Abstract: We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous co… ▽ More

    Submitted 25 June, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: latest NeurIPS'19 version

  32. arXiv:1902.08605  [pdf, other

    cs.LG stat.ML

    Are Few-Shot Learning Benchmarks too Simple ? Solving them without Task Supervision at Test-Time

    Authors: Gabriel Huang, Hugo Larochelle, Simon Lacoste-Julien

    Abstract: We show that several popular few-shot learning benchmarks can be solved with varying degrees of success without using support set Labels at Test-time (LT). To this end, we introduce a new baseline called Centroid Networks, a modification of Prototypical Networks in which the support set labels are hidden from the method at test-time and have to be recovered through clustering. A benchmark that can… ▽ More

    Submitted 24 July, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

  33. arXiv:1901.07935   

    cs.LG math.OC stat.ML

    Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

    Authors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Fre**ger, Simon Lacoste-Julien, Andrea Lodi

    Abstract: This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict tactical solutions to a given operational problem. In this context, the tactical solution is less detailed than the operational one but it has to be computed in very short time and under imperfect information. The problem is of importa… ▽ More

    Submitted 1 March, 2021; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: Same as arXiv:1807.11876, added by mistake

    Journal ref: INFORMS Journal on Computing 34(1):227-242, 2021

  34. arXiv:1810.11544  [pdf, other

    cs.LG cs.AI stat.ML

    Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

    Authors: Kirill Struminsky, Simon Lacoste-Julien, Anton Osokin

    Abstract: We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which is non-trivial (not always… ▽ More

    Submitted 9 January, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: Appears in: Advances in Neural Information Processing Systems 31 (NeurIPS 2018). 18 pages

  35. arXiv:1810.08591  [pdf, other

    cs.LG stat.ML

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Authors: Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tr… ▽ More

    Submitted 18 December, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

    Journal ref: ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena

  36. arXiv:1809.06367  [pdf, other

    cs.LG cs.CV stat.ML

    Scattering Networks for Hybrid Representation Learning

    Authors: Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

    Abstract: Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1703.08961

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2018, pp.11

  37. Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

    Authors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Fre**ger, Simon Lacoste-Julien, Andrea Lodi

    Abstract: This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming where the second stage is demanding computationally. We aim to predict at a high speed th… ▽ More

    Submitted 1 March, 2021; v1 submitted 31 July, 2018; originally announced July 2018.

    Journal ref: INFORMS Journal on Computing 34(1):227-242, 2021

  38. arXiv:1807.04740  [pdf, other

    cs.LG stat.ML

    Negative Momentum for Improved Game Dynamics

    Authors: Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optim… ▽ More

    Submitted 28 August, 2020; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). Minor changes with respect to the AISTATS version: typo corrected in Thm. 6 (squared condition number instead of condition number; and small change in constant) and dependence in $β$ changed in Theorem 5 for the formal statement; not changing the conclusions. 28 pages

    ACM Class: I.2.6; G.1.6

  39. arXiv:1804.03176  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Splitting via Augmented Lagrangian Method

    Authors: Gauthier Gidel, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets,… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: Appears in: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018). 30 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  40. arXiv:1802.10551  [pdf, other

    cs.LG math.OC stat.ML

    A Variational Inequality Perspective on Generative Adversarial Networks

    Authors: Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train. One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods designed for this adversarial training. In this work, we cast GAN optimization probl… ▽ More

    Submitted 28 August, 2020; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: Appears in: Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). Minor modifications with respect to the ICLR version (First paragraph of page 2 and section 3.3): New reference [Popov 1980] and discussion with regards to the novelty of extrapolation from the past. 38 pages

    ACM Class: I.2.6; G.1.6

  41. arXiv:1801.04055  [pdf, other

    cs.LG stat.ML

    A3T: Adversarially Augmented Adversarial Training

    Authors: Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of d… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

    Comments: accepted for an oral presentation in Machine Deception Workshop, NIPS 2017

  42. arXiv:1801.03749  [pdf, other

    math.OC cs.LG stat.ML

    Improved asynchronous parallel optimization analysis for stochastic incremental methods

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. Unfortunately, conducting the theoretical analysis asynchronous methods is difficult, notably due to the introduction of delay and inconsistency in inherently sequential algorithms. Handling thes… ▽ More

    Submitted 21 March, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 67 pages, published in JMLR, can be found online at http://jmlr.org/papers/v19/17-650.html. arXiv admin note: substantial text overlap with arXiv:1606.04809

  43. arXiv:1712.08577  [pdf, other

    stat.ML cs.LG

    Adaptive Stochastic Dual Coordinate Ascent for Conditional Random Fields

    Authors: Rémi Le Priol, Alexandre Piché, Simon Lacoste-Julien

    Abstract: This work investigates the training of conditional random fields (CRFs) via the stochastic dual coordinate ascent (SDCA) algorithm of Shalev-Shwartz and Zhang (2016). SDCA enjoys a linear convergence rate and a strong empirical performance for binary classification problems. However, it has never been used to train CRFs. Yet it benefits from an `exact' line search with a single marginalization ora… ▽ More

    Submitted 9 July, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: Published as a conference paper at UAI 2018. 22 pages

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  44. arXiv:1708.02511  [pdf, other

    cs.LG stat.ML

    Parametric Adversarial Divergences are Good Losses for Generative Modeling

    Authors: Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption. In this position paper, we argue that despite being "non-optimal", p… ▽ More

    Submitted 21 October, 2021; v1 submitted 8 August, 2017; originally announced August 2017.

  45. arXiv:1707.06468  [pdf, other

    math.OC cs.LG stat.ML

    Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

    Authors: Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien

    Abstract: Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as… ▽ More

    Submitted 5 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

    Journal ref: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  46. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  47. arXiv:1706.04499  [pdf, other

    cs.LG stat.ML

    SEARNN: Training RNNs with Global-Local Losses

    Authors: Rémi Leblond, Jean-Baptiste Alayrac, Anton Osokin, Simon Lacoste-Julien

    Abstract: We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropria… ▽ More

    Submitted 4 March, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Published as a conference paper at ICLR 2018, 16 pages

  48. arXiv:1703.02403  [pdf, other

    cs.LG stat.ML

    On Structured Prediction Theory with Calibrated Convex Surrogate Losses

    Authors: Anton Osokin, Francis Bach, Simon Lacoste-Julien

    Abstract: We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prio… ▽ More

    Submitted 29 January, 2018; v1 submitted 7 March, 2017; originally announced March 2017.

    Comments: Appears in: Advances in Neural Information Processing Systems 30 (NIPS 2017). 30 pages

  49. arXiv:1610.07797  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Algorithms for Saddle Point Problems

    Authors: Gauthier Gidel, Tony Jebara, Simon Lacoste-Julien

    Abstract: We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also… ▽ More

    Submitted 3 March, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). 39 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  50. arXiv:1607.00345  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Convergence Rate of Frank-Wolfe for Non-Convex Objectives

    Authors: Simon Lacoste-Julien

    Abstract: We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of $O(1/\sqrt{t})$ on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).

    Submitted 1 July, 2016; originally announced July 2016.

    Comments: 6 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6