Skip to main content

Showing 1–50 of 53 results for author: Mitliagkas, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09073  [pdf, other

    cs.LG

    Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

    Authors: Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon

    Abstract: We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2402.05271  [pdf, other

    stat.ML cs.AI cs.LG

    Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

    Authors: Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala

    Abstract: Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as… ▽ More

    Submitted 24 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  3. arXiv:2308.11480  [pdf, other

    cs.LG cs.AI cs.CV

    Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

    Authors: Charles Guille-Escuret, Pierre-André Noël, Ioannis Mitliagkas, David Vazquez, Joao Monteiro

    Abstract: Improving the reliability of deployed machine learning systems often involves develo** methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  4. arXiv:2307.08187  [pdf, other

    cs.LG cs.AI

    An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

    Authors: Hiroki Naganuma, Ryuichiro Hataya, Ioannis Mitliagkas

    Abstract: In out-of-distribution (OOD) generalization tasks, fine-tuning pre-trained models has become a prevalent strategy. Different from most prior work that has focused on advancing learning algorithms, we systematically examined how pre-trained model size, pre-training dataset size, and training strategies impact generalization and uncertainty calibration on downstream tasks. We evaluated 100 models ac… ▽ More

    Submitted 30 May, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

  5. arXiv:2307.02598  [pdf, other

    cs.LG stat.ML

    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

    Authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions… ▽ More

    Submitted 2 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 37 (NeurIPS 2023). 39 pages

    ACM Class: I.2.6; I.5.1

  6. arXiv:2306.11922  [pdf, other

    cs.LG math.OC

    No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

    Authors: Charles Guille-Escuret, Hiroki Naganuma, Kilian Fatras, Ioannis Mitliagkas

    Abstract: Understanding the optimization dynamics of neural networks is necessary for closing the gap between theory and practice. Stochastic first-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks. This efficiency, however, contrasts with the non-convex and seemingly complex structure of neural loss landscapes. In this study, we delve into the fundamenta… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  7. arXiv:2304.06879  [pdf, other

    cs.LG cs.GT

    Performative Prediction with Neural Networks

    Authors: Mehrnaz Mofakhami, Ioannis Mitliagkas, Gauthier Gidel

    Abstract: Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers that are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuo… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Published at AISTATS 2023

  8. arXiv:2211.14666  [pdf, other

    cs.LG stat.ML

    Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

    Authors: Sébastien Lachapelle, Tristan Deleu, Divyat Mahajan, Ioannis Mitliagkas, Yoshua Bengio, Simon Lacoste-Julien, Quentin Bertrand

    Abstract: Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maxima… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: Appears in: Fortieth International Conference on Machine Learning (ICML 2023). 36 pages

    ACM Class: I.2.6; I.5.1

  9. arXiv:2211.08583  [pdf, other

    cs.LG cs.AI

    Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

    Authors: Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

    Abstract: Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular firs… ▽ More

    Submitted 5 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted to TMLR

  10. arXiv:2211.01939  [pdf, other

    cs.LG cs.AI stat.ME

    Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

    Authors: Divyat Mahajan, Ioannis Mitliagkas, Brady Neal, Vasilis Syrgkanis

    Abstract: We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation. Unlike machine learning, there is no perfect analogue of cross-validation for model selection as we do not observe the counterfactual potential outcomes. Towards this, a variety of surrogate metrics have been proposed for CATE model selection that use only observed… ▽ More

    Submitted 29 April, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024. (Spotlight)

  11. arXiv:2210.03150  [pdf, other

    cs.LG cs.AI

    Towards Out-of-Distribution Adversarial Robustness

    Authors: Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

    Abstract: Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fails to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different $L_p$ norms, we show that there is potential for improvement against many commonly used attacks by adopting a domain generalisation approach. C… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Version of NeurIPS 2023 submission

  12. arXiv:2210.01742  [pdf, other

    cs.LG cs.CV

    CADet: Fully Self-Supervised Out-Of-Distribution Detection With Contrastive Learning

    Authors: Charles Guille-Escuret, Pau Rodriguez, David Vazquez, Ioannis Mitliagkas, Joao Monteiro

    Abstract: Handling out-of-distribution (OOD) samples has become a major stake in the real-world deployment of machine learning systems. This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. First, we pair self-supervised contrastive learning with the maximum mean discrepancy (MMD) two-sample… ▽ More

    Submitted 27 June, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Journal ref: Advances in Neural Information Processing Systems 36 (2024)

  13. arXiv:2210.01210  [pdf, ps, other

    cs.CV cs.LG stat.ML

    A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

    Authors: Tiago Salvador, Kilian Fatras, Ioannis Mitliagkas, Adam Oberman

    Abstract: Unsupervised Domain Adaptation (UDA) aims at classifying unlabeled target images leveraging source labeled ones. In this work, we consider the Partial Domain Adaptation (PDA) variant, where we have extra source classes not present in the target domain. Most successful algorithms use model selection strategies that rely on target labels to find the best hyper-parameters and/or models along training… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: 17 pages, 13 tables

  14. arXiv:2209.14863  [pdf, other

    stat.ML cs.LG

    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

    Authors: Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis Mitliagkas, Murat A. Erdogdu

    Abstract: We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove… ▽ More

    Submitted 15 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 39 pages, 2 figures. To appear in the International Conference on Learning Representations (ICLR), 2023

  15. arXiv:2206.11180  [pdf, other

    cs.CV cs.LG stat.ML

    Optimal transport meets noisy label robust loss and MixUp regularization for domain adaptation

    Authors: Kilian Fatras, Hiroki Naganuma, Ioannis Mitliagkas

    Abstract: It is common in computer vision to be confronted with domain shift: images which have the same class but different acquisition conditions. In domain adaptation (DA), one wants to classify unlabeled target images using source labeled images. Unfortunately, deep neural networks trained on a source training set perform poorly on target images which do not belong to the training domain. One strategy t… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  16. arXiv:2206.05825  [pdf, other

    cs.LG cs.AI cs.GT

    A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

    Authors: Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

    Abstract: This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equili… ▽ More

    Submitted 11 April, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

  17. arXiv:2204.04606  [pdf, other

    cs.LG cs.AI stat.ML

    Towards efficient representation identification in supervised learning

    Authors: Kartik Ahuja, Divyat Mahajan, Vasilis Syrgkanis, Ioannis Mitliagkas

    Abstract: Humans have a remarkable ability to disentangle complex sensory inputs (e.g., image, text) into simple factors of variation (e.g., shape, color) without much supervision. This ability has inspired many works that attempt to solve the following question: how do we invert the data generation process to extract those factors with minimal or no supervision? Several works in the literature on non-linea… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Proceedings of the First Conference on Causal Learning and Reasoning

  18. arXiv:2110.15412  [pdf, other

    math.OC cs.LG

    Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize

    Authors: Ryan D'Orazio, Nicolas Loizou, Issam Laradji, Ioannis Mitliagkas

    Abstract: We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex optimization. In relatively smooth convex optimization we provide new convergence guarantees for SMD with a constant stepsize. For smooth convex optimization we propose a new adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS). Notably, our convergence r… ▽ More

    Submitted 24 May, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

  19. arXiv:2110.10815  [pdf, other

    cs.LG math.OC stat.ML

    Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

    Authors: Manuela Girotti, Ioannis Mitliagkas, Gauthier Gidel

    Abstract: We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligi… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: 10 pages (Main) + 19 pages (Appendix), 6 figures

  20. arXiv:2107.00052  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

    Authors: Nicolas Loizou, Hugo Berard, Gauthier Gidel, Ioannis Mitliagkas, Simon Lacoste-Julien

    Abstract: Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used success… ▽ More

    Submitted 4 November, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  21. arXiv:2106.06607  [pdf, other

    cs.LG stat.ML

    Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

    Authors: Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish

    Abstract: The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due t… ▽ More

    Submitted 20 November, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

  22. arXiv:2105.14080  [pdf, other

    cs.LG cs.CV math.OC stat.ML

    Gotta Go Fast When Generating Data with Score-Based Models

    Authors: Alexia Jolicoeur-Martineau, Ke Li, Rémi Piché-Taillefer, Tal Kachman, Ioannis Mitliagkas

    Abstract: Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evalua… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Code is available on https://github.com/AlexiaJM/score_sde_fast_sampling

  23. arXiv:2012.05782  [pdf, other

    cs.LG math.OC

    A Study of Condition Numbers for First-Order Optimization

    Authors: Charles Guille-Escuret, Baptiste Goujaud, Manuela Girotti, Ioannis Mitliagkas

    Abstract: The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impa… ▽ More

    Submitted 25 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

    Journal ref: International Conference on Artificial Intelligence and Statistics. PMLR, 2021. p. 1261-1269

  24. arXiv:2010.13846  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    LEAD: Min-Max Optimization from a Physical Perspective

    Authors: Reyhane Askari Hemmat, Amartya Mitra, Guillaume Lajoie, Ioannis Mitliagkas

    Abstract: Adversarial formulations such as generative adversarial networks (GANs) have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the rotational dynamics that hinder their convergence. In this paper, we show that game optimization shares dynamic properties with particle systems subject to multiple forces, and one can leverage tools from physics to… ▽ More

    Submitted 21 June, 2023; v1 submitted 26 October, 2020; originally announced October 2020.

  25. arXiv:2010.11924  [pdf, other

    cs.LG stat.ML

    In Search of Robust Measures of Generalization

    Authors: Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

    Abstract: One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neu… ▽ More

    Submitted 20 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 27 pages, 11 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  26. arXiv:2009.05475  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial score matching and improved sampling for image generation

    Authors: Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, Ioannis Mitliagkas

    Abstract: Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse… ▽ More

    Submitted 10 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: Code at https://github.com/AlexiaJM/AdversarialConsistentScoreMatching

  27. arXiv:2007.04202  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Stochastic Hamiltonian Gradient Methods for Smooth Games

    Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using t… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning

  28. arXiv:2001.00602  [pdf, other

    cs.LG math.OC stat.ML

    Accelerating Smooth Games by Manipulating Spectral Shapes

    Authors: Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges… ▽ More

    Submitted 9 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 34 pages

    MSC Class: G.1.6; I.2.6 ACM Class: G.1.6; I.2.6

  29. arXiv:1911.00804  [pdf, other

    cs.LG stat.ML

    Generalizing to unseen domains via distribution matching

    Authors: Isabela Albuquerque, João Monteiro, Mohammad Darvishi, Tiago H. Falk, Ioannis Mitliagkas

    Abstract: Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice. In this work, we tackle such problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on the following lemma: by minimizing… ▽ More

    Submitted 15 September, 2021; v1 submitted 2 November, 2019; originally announced November 2019.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  30. arXiv:1910.06922  [pdf, other

    cs.LG stat.ML

    Gradient penalty from a maximum margin perspective

    Authors: Alexia Jolicoeur-Martineau, Ioannis Mitliagkas

    Abstract: A popular heuristic for improved performance in Generative adversarial networks (GANs) is to use some form of gradient penalty on the discriminator. This gradient penalty was originally motivated by a Wasserstein distance formulation. However, the use of gradient penalty in other GAN formulations is not well motivated. We present a unifying framework of expected margin maximization and show that a… ▽ More

    Submitted 24 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Code at https://github.com/AlexiaJM/MaximumMarginGANs

  31. arXiv:1906.07300  [pdf, ps, other

    cs.LG math.OC stat.ML

    Linear Lower Bounds and Conditioning of Differentiable Games

    Authors: Adam Ibrahim, Waïss Azizian, Gauthier Gidel, Ioannis Mitliagkas

    Abstract: Recent successes of game-theoretic formulations in ML have caused a resurgence of research interest in differentiable games. Overwhelmingly, that research focuses on methods and upper bounds on their speed of convergence. In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i.e. geometric) upper bounds observed in the litera… ▽ More

    Submitted 15 September, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: ICML 2020 final version

    Journal ref: Proceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  32. arXiv:1906.05945  [pdf, other

    cs.LG math.OC stat.ML

    A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Games

    Authors: Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

    Abstract: We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using variants of the gradient method (GD). Prime examples are extragradient (EG), the optimistic gradient method (OG) and consensus optimization (CO), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full benefits of theses… ▽ More

    Submitted 7 July, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 39 pages. Minor modification regarding prior work in comparison to the AISTATS Proceedings

    ACM Class: G.1.6; I.2.6

  33. arXiv:1906.03532  [pdf, other

    cs.LG math.OC stat.ML

    Reducing the variance in online optimization by transporting past gradients

    Authors: Sébastien M. R. Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, Nicolas Le Roux

    Abstract: Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing past gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transpo… ▽ More

    Submitted 18 June, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

    Comments: Open-source implementation available at: https://github.com/seba-1511/igt.pth

  34. arXiv:1905.11382  [pdf, other

    cs.LG cs.AI stat.ML

    State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

    Authors: Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer

    Abstract: Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We intr… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: ICML 2019 [full oral]. arXiv admin note: text overlap with arXiv:1805.08394

  35. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  36. arXiv:1901.08680  [pdf, other

    cs.LG stat.ML

    Multi-objective training of Generative Adversarial Networks with multiple discriminators

    Authors: Isabela Albuquerque, João Monteiro, Thang Doan, Breandan Considine, Tiago Falk, Ioannis Mitliagkas

    Abstract: Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in contrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator s… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: The first two authors contributed equally to this work

  37. arXiv:1810.08591  [pdf, other

    cs.LG stat.ML

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Authors: Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tr… ▽ More

    Submitted 18 December, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

    Journal ref: ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena

  38. arXiv:1810.03023  [pdf, other

    stat.ML cs.LG

    h-detach: Modifying the LSTM Gradient Towards Better Optimization

    Authors: Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

    Abstract: Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps. We introduce a simple stochastic algorithm (\texti… ▽ More

    Submitted 9 January, 2019; v1 submitted 6 October, 2018; originally announced October 2018.

    Comments: First two authors contributed equally. Published in ICLR 2019

  39. arXiv:1807.04740  [pdf, other

    cs.LG stat.ML

    Negative Momentum for Improved Game Dynamics

    Authors: Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optim… ▽ More

    Submitted 28 August, 2020; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). Minor changes with respect to the AISTATS version: typo corrected in Thm. 6 (squared condition number instead of condition number; and small change in constant) and dependence in $β$ changed in Theorem 5 for the formal statement; not changing the conclusions. 28 pages

    ACM Class: I.2.6; G.1.6

  40. arXiv:1806.05236  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Manifold Mixup: Better Representations by Interpolating Hidden States

    Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

    Abstract: Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr… ▽ More

    Submitted 11 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: To appear in ICML 2019

  41. arXiv:1804.02485  [pdf, other

    stat.ML cs.LG

    Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

    Authors: Alex Lamb, Jonathan Binas, Anirudh Goyal, Dmitriy Serdyuk, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio

    Abstract: Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

    Comments: Under Review ICML 2018

  42. arXiv:1708.05256  [pdf, other

    cs.PF cs.CV cs.LG

    Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

    Authors: Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey

    Abstract: This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: 12 pages, 9 figures

  43. arXiv:1707.05807  [pdf, other

    stat.ML cs.LG math.PR stat.ME

    Improving Gibbs Sampler Scan Quality with DoGS

    Authors: Ioannis Mitliagkas, Lester Mackey

    Abstract: The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling. In this work, we use Dobrushin influence as the basis of a practical tool to certify and efficiently improve the quality of a discrete Gibbs sampler. Our Dobrushin-optimized Gibbs samplers (DoGS) offer customized variable selection orders for a given sampling budg… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: ICML 2017

  44. arXiv:1707.02670  [pdf, other

    math.OC cs.DS cs.LG math.NA stat.ML

    Accelerated Stochastic Power Iteration

    Authors: Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

    Abstract: Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/Δ)$ full-data passes to recover the principal component of a matrix with eigen-gap $Δ$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrtΔ)$ passes. Modern applications, however, motivate m… ▽ More

    Submitted 9 July, 2017; originally announced July 2017.

    Comments: 37 pages, 5 figures

  45. arXiv:1707.02392  [pdf, other

    cs.CV cs.LG

    Learning Representations and Generative Models for 3D Point Clouds

    Authors: Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, Leonidas Guibas

    Abstract: Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep AutoEncoder (AE) network with state-of-the-art reconstruction quality and generalization ability. The learned representations outperform existing methods on 3D recognition tasks and enable… ▽ More

    Submitted 12 June, 2018; v1 submitted 7 July, 2017; originally announced July 2017.

    Journal ref: 35th International Conference on Machine Learning (ICML), 2018

  46. arXiv:1706.03471  [pdf, other

    stat.ML cs.AI

    YellowFin and the Art of Momentum Tuning

    Authors: Jian Zhang, Ioannis Mitliagkas

    Abstract: Hyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield better test metrics. Motivated by this trend, we ask: can simp… ▽ More

    Submitted 14 February, 2018; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: Updated to reflect improved stability discussion and work for SysML presentation

  47. arXiv:1606.07365  [pdf, other

    stat.ML cs.LG

    Parallel SGD: When does averaging help?

    Authors: Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

    Abstract: Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence. For convex objectives, we show the benefit of frequent averaging depends on the gradient v… ▽ More

    Submitted 23 June, 2016; originally announced June 2016.

  48. arXiv:1606.04487  [pdf, other

    cs.DC cs.LG

    Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

    Authors: Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré

    Abstract: We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over sta… ▽ More

    Submitted 19 October, 2016; v1 submitted 14 June, 2016; originally announced June 2016.

    ACM Class: I.2.6

  49. arXiv:1606.03432  [pdf, other

    cs.LG cs.AI stat.ML

    Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

    Authors: Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

    Abstract: Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured tha… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  50. arXiv:1605.09774  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Asynchrony begets Momentum, with an Application to Deep Learning

    Authors: Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré

    Abstract: Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We show that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the objective function, so it is applicable to deep learning systems. We obse… ▽ More

    Submitted 25 November, 2016; v1 submitted 31 May, 2016; originally announced May 2016.

    Comments: Full version of a paper published in Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2016