Skip to main content

Showing 1–50 of 82 results for author: Lacoste-Julien, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04558  [pdf, other

    cs.LG math.OC

    On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

    Authors: Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, Jose Gallego-Posada

    Abstract: Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Published at ICML 2024. Code available at https://github.com/motahareh-sohrabi/nuPI

  2. arXiv:2401.04890  [pdf, other

    stat.ML cs.LG

    Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies

    Authors: Sébastien Lachapelle, Pau Rodríguez López, Yash Sharma, Katie Everett, Rémi Le Priol, Alexandre Lacoste, Simon Lacoste-Julien

    Abstract: This work introduces a novel principle for disentanglement we call mechanism sparsity regularization, which applies when the latent factors of interest depend sparsely on observed auxiliary variables and/or past latent factors. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that explains t… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 88 pages

    ACM Class: I.2.6; I.5.1

  3. arXiv:2311.03096  [pdf, other

    cs.LG stat.ML

    Weight-Sharing Regularization

    Authors: Mehran Shakerinava, Motahareh Sohrabi, Siamak Ravanbakhsh, Simon Lacoste-Julien

    Abstract: Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights $w \in \mathbb{R}^d$ of a neural network, defined as $\mathcal{R}(w) = \frac{1}{d - 1}\sum_{i > j}^d |w_i - w_j|$. We study the proximal map** of $\mathcal{R}$ and provide an intuitive interpretation of it in terms of a physical system of interacting particles. We… ▽ More

    Submitted 10 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Our code is available at https://github.com/motahareh-sohrabi/weight-sharing-regularization

  4. arXiv:2310.20673  [pdf, other

    cs.LG cs.CY

    Balancing Act: Constraining Disparate Impact in Sparse Models

    Authors: Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada

    Abstract: Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact indu… ▽ More

    Submitted 7 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024. Code available at https://github.com/merajhashemi/balancing-act

  5. arXiv:2307.09638  [pdf, other

    cs.LG cs.AI

    Promoting Exploration in Memory-Augmented Adam using Critical Momenta

    Authors: Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

    Abstract: Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergence and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages e… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Published in Transactions on Machine Learning Research

  6. arXiv:2307.02598  [pdf, other

    cs.LG stat.ML

    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

    Authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 37 (NeurIPS 2023). 39 pages

    ACM Class: I.2.6; I.5.1

  7. arXiv:2306.16334  [pdf, other

    cs.LG cs.AI

    On the Identifiability of Quantized Factors

    Authors: Vitória Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent

    Abstract: Disentanglement aims to recover meaningful latent ground-truth factors from the observed distribution solely, and is formalized through the theory of identifiability. The identifiability of independent latent factors is proven to be impossible in the unsupervised i.i.d. setting under a general nonlinear map from factors to observations. In this work, however, we demonstrate that it is possible to… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Appears in: 3rd Conference on Causal Learning and Reasoning (CLeaR 2024). 39 pages

  8. arXiv:2304.03094  [pdf, other

    cs.LG cs.CV

    PopulAtion Parameter Averaging (PAPA)

    Authors: Alexia Jolicoeur-Martineau, Emy Gervais, Kilian Fatras, Yan Zhang, Simon Lacoste-Julien

    Abstract: Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights. However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when different enough to benefit from… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: Blog post: https://ajolicoeur.wordpress.com/papa/, Code: https://github.com/SamsungSAILMontreal/PAPA, TMLR journal publication: https://openreview.net/forum?id=cPDVjsOytS

  9. arXiv:2303.04143  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

    Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

    Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for i… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

  10. arXiv:2301.13197  [pdf, other

    cs.LG cs.CV

    Unlocking Slot Attention by Changing Optimal Transport Costs

    Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn):… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Published at International Conference on Machine Learning (ICML) 2023

  11. arXiv:2212.01674  [pdf, other

    cs.CV cs.AI cs.LG

    CrossSplit: Mitigating Label Noise Memorization through Data Splitting

    Authors: Jihye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien

    Abstract: We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ing… ▽ More

    Submitted 26 April, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted to ICML 2023

  12. arXiv:2211.14666  [pdf, other

    cs.LG stat.ML

    Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

    Authors: Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

    Abstract: Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maxima… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: Appears in: Fortieth International Conference on Machine Learning (ICML 2023). 36 pages

    ACM Class: I.2.6; I.5.1

  13. arXiv:2208.04425  [pdf, other

    cs.LG

    Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

    Authors: Jose Gallego-Posada, Juan Ramirez, Akram Erraqabi, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models, this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial… ▽ More

    Submitted 27 November, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: NeurIPS 2022 - Code available at https://github.com/gallego-posada/constrained_sparsity

  14. arXiv:2207.07732  [pdf, other

    stat.ML cs.LG

    Partial Disentanglement via Mechanism Sparsity

    Authors: Sébastien Lachapelle, Simon Lacoste-Julien

    Abstract: Disentanglement via mechanism sparsity was introduced recently as a principled approach to extract latent factors without supervision when the causal graph relating them in time is sparse, and/or when actions are observed and affect them sparsely. However, this theory applies only to ground-truth graphs satisfying a specific criterion. In this work, we introduce a generalization of this theory whi… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Appears in: The First Workshop on Causal Representation Learning (CRL 2022) at UAI. 26 pages

  15. arXiv:2203.04940  [pdf, other

    cs.LG cs.AI cs.DM math.OC

    Data-Efficient Structured Pruning via Submodular Optimization

    Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien

    Abstract: Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on… ▽ More

    Submitted 10 February, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

  16. arXiv:2202.13903  [pdf, other

    cs.LG stat.ML

    Bayesian Structure Learning with Generative Flow Networks

    Authors: Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

    Abstract: In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets… ▽ More

    Submitted 28 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

  17. arXiv:2111.12193  [pdf, other

    cs.LG stat.ML

    Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

    Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equ… ▽ More

    Submitted 3 February, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Published at International Conference on Learning Representations (ICLR) 2022

  18. arXiv:2111.06826  [pdf, other

    stat.ML cs.LG math.ST

    Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

    Authors: Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

    Abstract: We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime.… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 9 pages and 3 figures + Appendix

  19. arXiv:2110.14711  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey of Self-Supervised and Few-Shot Object Detection

    Authors: Gabriel Huang, Issam Laradji, David Vazquez, Simon Lacoste-Julien, Pau Rodriguez

    Abstract: Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised metho… ▽ More

    Submitted 23 August, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence. Awesome Few-Shot Object Detection (Leaderboard) at https://github.com/gabrielhuang/awesome-few-shot-object-detection

  20. arXiv:2107.10098  [pdf, other

    stat.ML cs.LG

    Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

    Authors: Sébastien Lachapelle, Pau Rodríguez López, Yash Sharma, Katie Everett, Rémi Le Priol, Alexandre Lacoste, Simon Lacoste-Julien

    Abstract: This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent factors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that rel… ▽ More

    Submitted 23 February, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Appears in: 1st Conference on Causal Learning and Reasoning (CLeaR 2022). 57 pages

    ACM Class: I.2.6; I.5.1

  21. arXiv:2107.00052  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

    Authors: Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used success… ▽ More

    Submitted 4 November, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  22. arXiv:2105.11646  [pdf, other

    cs.LG

    Structured Convolutional Kernel Networks for Airline Crew Scheduling

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: Motivated by the needs from an airline crew scheduling application, we introduce structured convolutional kernel networks (Struct-CKN), which combine CKNs from Mairal et al. (2014) in a structured prediction framework that supports constraints on the outputs. CKNs are a particular kind of convolutional neural networks that approximate a kernel feature map on training data, thus combining propertie… ▽ More

    Submitted 22 July, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: ICML 2021 (Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11626-11636)

  23. arXiv:2103.09027  [pdf, other

    cs.LG cs.CV

    Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

    Authors: Namyeong Kwon, Hwidong Na, Gabriel Huang, Simon Lacoste-Julien

    Abstract: Model-agnostic meta-learning (MAML) is a popular method for few-shot learning but assumes that we have access to the meta-training set. In practice, training on the meta-training set may not always be an option due to data privacy concerns, intellectual property issues, or merely lack of computing resources. In this paper, we consider the novel problem of repurposing pretrained MAML checkpoints to… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: Appears in: Proceedings of the Ninth International Conference on Learning Representations (ICLR 2021). 20 pages

  24. arXiv:2103.02014  [pdf, other

    cs.LG cs.CR cs.DS

    Online Adversarial Attacks

    Authors: Andjela Mladenovic, Avishek Joey Bose, Hugo Berard, William L. Hamilton, Simon Lacoste-Julien, Pascal Vincent, Gauthier Gidel

    Abstract: Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrev… ▽ More

    Submitted 22 March, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: ICLR 2022

  25. arXiv:2102.09645  [pdf, other

    cs.LG math.OC stat.ML

    SVRG Meets AdaGrad: Painless Variance Reduction

    Authors: Benjamin Dubois-Taine, Sharan Vaswani, Reza Babanezhad, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

  26. arXiv:2011.11203  [pdf, ps, other

    cs.LG

    Geometry-Aware Universal Mirror-Prox

    Authors: Reza Babanezhad, Simon Lacoste-Julien

    Abstract: Mirror-prox (MP) is a well-known algorithm to solve variational inequality (VI) problems. VI with a monotone operator covers a large group of settings such as convex minimization, min-max or saddle point problems. To get a convergent algorithm, the step-size of the classic MP algorithm relies heavily on the problem dependent knowledge of the operator such as its smoothness parameter which is hard… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

  27. arXiv:2011.11150  [pdf, other

    cs.LG stat.ML

    On the Convergence of Continuous Constrained Optimization for Structure Learning

    Authors: Ignavier Ng, Sébastien Lachapelle, Nan Rosemary Ke, Simon Lacoste-Julien, Kun Zhang

    Abstract: Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty c… ▽ More

    Submitted 10 April, 2022; v1 submitted 22 November, 2020; originally announced November 2020.

    Comments: AISTATS 2022. A preliminary version of this paper was presented at the NeurIPS 2020 Workshop on Causal Discovery and Causality-Inspired Machine Learning. The code is available at https://github.com/ignavierng/notears-convergence

  28. Machine Learning in Airline Crew Pairing to Construct Initial Clusters for Dynamic Constraint Aggregation

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: The crew pairing problem (CPP) is generally modelled as a set partitioning problem where the flights have to be partitioned in pairings. A pairing is a sequence of flight legs separated by connection time and rest periods that starts and ends at the same base. Because of the extensive list of complex rules and regulations, determining whether a sequence of flights constitutes a feasible pairing ca… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: First publication in the "Cahiers du GERAD" series in February 2020. Submitted to EURO Journal on Transportation and Logistics on January 17, 2020 and available online on September 2, 2020

    Journal ref: EURO Journal on Transportation and Logistics, 100020 (2020)

  29. arXiv:2009.12501  [pdf, other

    cs.LG math.OC stat.ML

    Flight-connection Prediction for Airline Crew Scheduling to Construct Initial Clusters for OR Optimizer

    Authors: Yassine Yaakoubi, François Soumis, Simon Lacoste-Julien

    Abstract: We present a case study of using machine learning classification algorithms to initialize a large-scale commercial solver (GENCOL) based on column generation in the context of the airline crew pairing problem, where small savings of as little as 1% translate to increasing annual revenue by dozens of millions of dollars in a large airline. Under the imitation learning framework, we focus on the pro… ▽ More

    Submitted 2 March, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

    Comments: First publication on the "Cahiers du GERAD" series in April 2019

    Report number: G-2019-26

  30. arXiv:2008.00938  [pdf, other

    cs.LG stat.ML

    Implicit Regularization via Neural Feature Alignment

    Authors: Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

    Abstract: We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. This can be interpreted as a combined mechanism of feature selection and compression. By extrapolating a new analysis of Rad… ▽ More

    Submitted 16 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: AISTATS 2021

  31. arXiv:2007.04202  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Hamiltonian Gradient Methods for Smooth Games

    Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using t… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning

  32. arXiv:2007.01754  [pdf, other

    cs.LG stat.ML

    Differentiable Causal Discovery from Interventional Data

    Authors: Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, Alexandre Drouin

    Abstract: Learning a causal directed acyclic graph from data is a challenging task that involves solving a combinatorial problem for which the solution is not always identifiable. A new line of work reformulates this problem as a continuous constrained optimization one, which is solved via the augmented Lagrangian method. However, most methods based on this idea do not make use of interventional data, which… ▽ More

    Submitted 3 November, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Appears in: Advances in Neural Information Processing Systems 34 (NeurIPS 2020). 46 pages

    ACM Class: I.2.6; I.5.1

  33. arXiv:2007.00720  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Adversarial Example Games

    Authors: Avishek Joey Bose, Gauthier Gidel, Hugo Berard, Andre Cianflone, Pascal Vincent, Simon Lacoste-Julien, William L. Hamilton

    Abstract: The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks to guide the development of safeguards against them. This includes attack methods in the challenging non-interactive blackbox setting, where adversarial attacks are generated without any access, including queries, to the target model. Prior attacks i… ▽ More

    Submitted 8 January, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Appears in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  34. arXiv:2006.06835  [pdf, other

    cs.LG math.OC stat.ML

    Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

    Authors: Sharan Vaswani, Issam Laradji, Frederik Kunstner, Si Yi Meng, Mark Schmidt, Simon Lacoste-Julien

    Abstract: Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only ap… ▽ More

    Submitted 18 February, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  35. arXiv:2006.06821  [pdf, other

    cs.LG stat.ML

    To Each Optimizer a Norm, To Each Norm its Generalization

    Authors: Sharan Vaswani, Reza Babanezhad, Jose Gallego-Posada, Aaron Mishkin, Simon Lacoste-Julien, Nicolas Le Roux

    Abstract: We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoni… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  36. arXiv:2005.09136  [pdf, other

    stat.ML cs.LG

    An Analysis of the Adaptation Speed of Causal Models

    Authors: Rémi Le Priol, Reza Babanezhad Harikandeh, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: Consider a collection of datasets generated by unknown interventions on an unknown structural causal model $G$. Recently, Bengio et al. (2020) conjectured that among all candidate models, $G$ is the fastest to adapt from one dataset to another, along with promising experiments. Indeed, intuitively $G$ has less mechanisms to adapt, but this justification is incomplete. Our contribution is a more th… ▽ More

    Submitted 25 February, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Published at AISTATS 2021. 10 pages main articles, 19 pages supplement, 10 figures

  37. arXiv:2002.10542  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

    Authors: Nicolas Loizou, Sharan Vaswani, Issam Laradji, Simon Lacoste-Julien

    Abstract: We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting th… ▽ More

    Submitted 22 March, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  38. arXiv:2001.00602  [pdf, other

    cs.LG math.OC stat.ML

    Accelerating Smooth Games by Manipulating Spectral Shapes

    Authors: Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges… ▽ More

    Submitted 9 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 34 pages

    MSC Class: G.1.6; I.2.6 ACM Class: G.1.6; I.2.6

  39. arXiv:1910.04920  [pdf, other

    cs.LG math.OC stat.ML

    Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

    Authors: Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien

    Abstract: We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient… ▽ More

    Submitted 22 March, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: AISTATS, 2020

  40. arXiv:1906.08325  [pdf, other

    cs.LG stat.ML

    GAIT: A Geometric Approach to Information Theory

    Authors: Jose Gallego-Posada, Ankit Vani, Max Schwarzer, Simon Lacoste-Julien

    Abstract: We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our pr… ▽ More

    Submitted 13 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020. 19 pages

    Journal ref: PMLR (2020) 108:2601-2611

  41. arXiv:1906.05945  [pdf, other

    cs.LG math.OC stat.ML

    A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Games

    Authors: Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using variants of the gradient method (GD). Prime examples are extragradient (EG), the optimistic gradient method (OG) and consensus optimization (CO), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full benefits of theses… ▽ More

    Submitted 7 July, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 39 pages. Minor modification regarding prior work in comparison to the AISTATS Proceedings

    ACM Class: G.1.6; I.2.6

  42. arXiv:1906.04848  [pdf, other

    cs.LG stat.ML

    A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

    Authors: Hugo Berard, Gauthier Gidel, Amjad Almahairi, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Generative adversarial networks have been very successful in generative modeling, however they remain relatively challenging to train compared to standard deep neural networks. In this paper, we propose new visualization techniques for the optimization landscapes of GANs that enable us to study the game vector field resulting from the concatenation of the gradient of both players. Using these visu… ▽ More

    Submitted 27 April, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

  43. arXiv:1906.02226  [pdf, other

    cs.LG stat.ML

    Gradient-Based Neural DAG Learning

    Authors: Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, Simon Lacoste-Julien

    Abstract: We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while avoiding the combinatorial nature of the problem. In addition to comparing our… ▽ More

    Submitted 18 February, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020). 23 pages

    ACM Class: I.2.6; I.5.1

  44. arXiv:1905.09997  [pdf, other

    cs.LG math.OC stat.ML

    Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

    Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

    Abstract: Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t… ▽ More

    Submitted 4 June, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added a citation to the related work of Paul Tseng, and citations to methods that had previously explored line-searches for deep learning empirically

  45. arXiv:1904.13262  [pdf, other

    cs.LG math.OC stat.ML

    Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

    Authors: Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

    Abstract: When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the train… ▽ More

    Submitted 5 December, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: 19 pages, to appear in NeurIPS 2019 proceedings

  46. arXiv:1904.08598  [pdf, other

    stat.ML cs.LG math.OC

    Reducing Noise in GAN Training with Variance Reduced Extragradient

    Authors: Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien

    Abstract: We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous co… ▽ More

    Submitted 25 June, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: latest NeurIPS'19 version

  47. arXiv:1902.08605  [pdf, other

    cs.LG stat.ML

    Are Few-Shot Learning Benchmarks too Simple ? Solving them without Task Supervision at Test-Time

    Authors: Gabriel Huang, Hugo Larochelle, Simon Lacoste-Julien

    Abstract: We show that several popular few-shot learning benchmarks can be solved with varying degrees of success without using support set Labels at Test-time (LT). To this end, we introduce a new baseline called Centroid Networks, a modification of Prototypical Networks in which the support set labels are hidden from the method at test-time and have to be recovered through clustering. A benchmark that can… ▽ More

    Submitted 24 July, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

  48. arXiv:1901.07935   

    cs.LG math.OC stat.ML

    Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

    Authors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Fre**ger, Simon Lacoste-Julien, Andrea Lodi

    Abstract: This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict tactical solutions to a given operational problem. In this context, the tactical solution is less detailed than the operational one but it has to be computed in very short time and under imperfect information. The problem is of importa… ▽ More

    Submitted 1 March, 2021; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: Same as arXiv:1807.11876, added by mistake

    Journal ref: INFORMS Journal on Computing 34(1):227-242, 2021

  49. arXiv:1810.11544  [pdf, other

    cs.LG cs.AI stat.ML

    Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

    Authors: Kirill Struminsky, Simon Lacoste-Julien, Anton Osokin

    Abstract: We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which is non-trivial (not always… ▽ More

    Submitted 9 January, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: Appears in: Advances in Neural Information Processing Systems 31 (NeurIPS 2018). 18 pages

  50. arXiv:1810.08591  [pdf, other

    cs.LG stat.ML

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Authors: Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tr… ▽ More

    Submitted 18 December, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

    Journal ref: ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena