Skip to main content

Showing 1–33 of 33 results for author: Lopez-Paz, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19737  [pdf, other

    cs.CL

    Better & Faster Large Language Models via Multi-token Prediction

    Authors: Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve

    Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared m… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  2. arXiv:2310.01202  [pdf, other

    stat.ML cs.LG

    Unified Uncertainty Calibration

    Authors: Kamalika Chaudhuri, David Lopez-Paz

    Abstract: To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not a… ▽ More

    Submitted 18 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  3. arXiv:2309.16748  [pdf, other

    cs.LG cs.AI stat.ML

    Discovering environments with XRM

    Authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz

    Abstract: Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generaliz… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  4. arXiv:2309.09888  [pdf, other

    cs.LG cs.AI stat.ML

    Context is Environment

    Authors: Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

    Abstract: Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the bitter lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LL… ▽ More

    Submitted 20 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: 41 Pages, 4 Figures

  5. arXiv:2305.16704  [pdf, other

    cs.LG stat.ML

    A Closer Look at In-Context Learning under Distribution Shifts

    Authors: Kartik Ahuja, David Lopez-Paz

    Abstract: In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  6. arXiv:2212.10445  [pdf, other

    cs.LG cs.AI cs.CV

    Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

    Authors: Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz

    Abstract: Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of interest. So, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks: these individual fine-tunings exist in isolation without ben… ▽ More

    Submitted 9 August, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 24 pages, 10 tables, 21 figures

  7. arXiv:2211.01866  [pdf, other

    cs.CV cs.LG

    ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

    Authors: Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

    Abstract: Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annot… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  8. arXiv:2207.09960  [pdf, other

    stat.ML cs.CY cs.LG

    Measuring and signing fairness as performance under multiple stakeholder distributions

    Authors: David Lopez-Paz, Diane Bouchacourt, Levent Sagun, Nicolas Usunier

    Abstract: As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  9. arXiv:2205.11672  [pdf, other

    stat.ML cs.LG

    Why does Throwing Away Data Improve Worst-Group Error?

    Authors: Kamalika Chaudhuri, Kartik Ahuja, Martin Arjovsky, David Lopez-Paz

    Abstract: When facing data with imbalanced classes or groups, practitioners follow an intriguing strategy to achieve best results. They throw away examples until the classes or groups are balanced in size, and then perform empirical risk minimization on the reduced training set. This opposes common wisdom in learning theory, where the expected error is supposed to decrease as the dataset grows in size. In t… ▽ More

    Submitted 21 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

  10. arXiv:2203.15516  [pdf, other

    cs.LG cs.AI

    Rich Feature Construction for the Optimization-Generalization Dilemma

    Authors: Jianyu Zhang, David Lopez-Paz, Léon Bottou

    Abstract: There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, r… ▽ More

    Submitted 8 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: 15 pages, ICML2022

  11. arXiv:2110.14503  [pdf, other

    cs.LG cs.AI cs.CR

    Simple data balancing achieves competitive worst-group-accuracy

    Authors: Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, David Lopez-Paz

    Abstract: We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at CLeaR (Causal Learning and Reasoning) 2022

  12. arXiv:2107.06217  [pdf, ps, other

    cs.LG cs.AI

    What classifiers know what they don't?

    Authors: Mohamed Ishmael Belghazi, David Lopez-Paz

    Abstract: Being uncertain when facing the unknown is key to intelligent decision making. However, machine learning algorithms lack reliable estimates about their predictive uncertainty. This leads to wrong and overly-confident decisions when encountering classes unseen during training. Despite the importance of equip** classifiers with uncertainty estimates ready for the real world, prior work has focused… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: 27 pages

  13. arXiv:2102.10867  [pdf, other

    cs.LG cs.AI

    Linear unit-tests for invariance discovery

    Authors: Benjamin Aubin, Agnieszka Słowik, Martin Arjovsky, Leon Bottou, David Lopez-Paz

    Abstract: There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a p… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: 5 pages, Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems

  14. arXiv:2007.01434  [pdf, other

    cs.LG stat.ML

    In Search of Lost Domain Generalization

    Authors: Ishaan Gulrajani, David Lopez-Paz

    Abstract: The goal of domain generalization algorithms is to predict well on distributions different from those seen during training. While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions -- datasets, architectures, and model selection criteria -- render fair and realistic comparisons difficult. In this paper, we are interested in understanding how useful domai… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  15. arXiv:2002.08165  [pdf, other

    cs.LG stat.ML

    Using Hindsight to Anchor Past Knowledge in Continual Learning

    Authors: Arslan Chaudhry, Albert Gordo, Puneet K. Dokania, Philip Torr, David Lopez-Paz

    Abstract: In continual learning, the learner faces a stream of data whose distribution changes over time. Modern neural networks are known to suffer under this setting, as they quickly forget previously acquired knowledge. To address such catastrophic forgetting, many continual learning methods implement different types of experience replay, re-learning on past data stored in a small buffer known as episodi… ▽ More

    Submitted 2 March, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted at AAAI 2021

  16. arXiv:1907.02893  [pdf, other

    stat.ML cs.AI cs.LG

    Invariant Risk Minimization

    Authors: Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz

    Abstract: We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the cau… ▽ More

    Submitted 27 March, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

  17. arXiv:1903.03825  [pdf

    stat.ML cs.AI cs.LG

    Interpolation Consistency Training for Semi-Supervised Learning

    Authors: Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

    Abstract: We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density reg… ▽ More

    Submitted 19 October, 2022; v1 submitted 9 March, 2019; originally announced March 2019.

    Comments: This is the latest version, which is published in the Journal, "Neural Networks", in 2022. All the previous results are unchanged. Keyword: Deep Learning, Semi-supervised Learning, Mixup

    Journal ref: Neural Networks, volume 145, pages 90-106 (2022)

  18. arXiv:1902.08401  [pdf, other

    cs.LG stat.ML

    Learning about an exponential amount of conditional distributions

    Authors: Mohamed Ishmael Belghazi, Maxime Oquab, Yann LeCun, David Lopez-Paz

    Abstract: We introduce the Neural Conditioner (NC), a self-supervised machine able to learn about all the conditional distributions of a random vector $X$. The NC is a function $NC(x \cdot a, a, r)$ that leverages adversarial training to match each conditional distribution $P(X_r|X_a=x_a)$. After training, the NC generalizes to sample from conditional distributions never seen, including the joint distributi… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: 8 pages, 7 figures

  19. arXiv:1811.00908  [pdf, other

    stat.ML cs.LG

    Single-Model Uncertainties for Deep Learning

    Authors: Natasa Tagasovska, David Lopez-Paz

    Abstract: We provide single-model estimates of aleatoric and epistemic uncertainty for deep neural networks. To estimate aleatoric uncertainty, we propose Simultaneous Quantile Regression (SQR), a loss function to learn all the conditional quantiles of a given target variable. These quantiles can be used to compute well-calibrated prediction intervals. To estimate epistemic uncertainty, we propose Orthonorm… ▽ More

    Submitted 6 September, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: To appear in NeurIPS 2019

  20. arXiv:1806.05236  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Manifold Mixup: Better Representations by Interpolating Hidden States

    Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

    Abstract: Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr… ▽ More

    Submitted 11 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: To appear in ICML 2019

  21. arXiv:1802.01421  [pdf, other

    stat.ML cs.CV cs.LG

    First-order Adversarial Vulnerability of Neural Networks and Input Dimension

    Authors: Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

    Abstract: Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for many standard netwo… ▽ More

    Submitted 16 June, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: Paper previously called: "Adversarial Vulnerability of Neural Networks Increases with Input Dimension". 9 pages main text and references, 11 pages appendix, 14 figures

    MSC Class: 68T45 ACM Class: I.2.6

    Journal ref: Proceedings of ICML 2019

  22. arXiv:1712.07822  [pdf, other

    stat.ML cs.AI cs.LG

    Geometrical Insights for Implicit Generative Modeling

    Authors: Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

    Abstract: Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences… ▽ More

    Submitted 21 August, 2019; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: this version fixes a typo in a definition

  23. arXiv:1710.09412  [pdf, other

    cs.LG stat.ML

    mixup: Beyond Empirical Risk Minimization

    Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

    Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear… ▽ More

    Submitted 27 April, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: ICLR camera ready version. Changes vs V1: fix repo URL; add ablation studies; add mixup + dropout etc

  24. arXiv:1707.05776  [pdf, other

    stat.ML cs.CV cs.LG

    Optimizing the Latent Space of Generative Networks

    Authors: Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

    Abstract: Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep… ▽ More

    Submitted 20 May, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

  25. arXiv:1706.08840  [pdf, other

    cs.LG cs.AI

    Gradient Episodic Memory for Continual Learning

    Authors: David Lopez-Paz, Marc'Aurelio Ranzato

    Abstract: One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. T… ▽ More

    Submitted 13 September, 2022; v1 submitted 26 June, 2017; originally announced June 2017.

    Comments: Published at NIPS 2017

  26. arXiv:1702.07306  [pdf, other

    stat.ML cs.LG

    Causal Discovery Using Proxy Variables

    Authors: Mateo Rojas-Carulla, Marco Baroni, David Lopez-Paz

    Abstract: Discovering causal relations is fundamental to reasoning and intelligence. In particular, observational causal discovery algorithms estimate the cause-effect relation between two random entities $X$ and $Y$, given $n$ samples from $P(X,Y)$. In this paper, we develop a framework to estimate the cause-effect relation between two static entities $x$ and $y$: for instance, an art masterpiece $x$ and… ▽ More

    Submitted 23 February, 2017; originally announced February 2017.

  27. arXiv:1611.08648  [pdf, other

    cs.CR cs.CY cs.LG stat.ML

    Patient-Driven Privacy Control through Generalized Distillation

    Authors: Z. Berkay Celik, David Lopez-Paz, Patrick McDaniel

    Abstract: The introduction of data analytics into medicine has changed the nature of patient treatment. In this, patients are asked to disclose personal information such as genetic markers, lifestyle habits, and clinical history. This data is then used by statistical models to predict personalized treatments. However, due to privacy concerns, patients often desire to withhold sensitive information. This sel… ▽ More

    Submitted 13 October, 2017; v1 submitted 25 November, 2016; originally announced November 2016.

    Comments: IEEE Symposium on Privacy-Aware Computing (IEEE PAC), 2017

  28. arXiv:1605.08179  [pdf, other

    stat.ML cs.CV

    Discovering Causal Signals in Images

    Authors: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

    Abstract: This paper establishes the existence of observable footprints that reveal the "causal dispositions" of the object categories appearing in collections of images. We achieve this goal in two steps. First, we take a learning approach to observational causal discovery, and build a classifier that achieves state-of-the-art performance on finding the causal direction between pairs of random variables, g… ▽ More

    Submitted 31 October, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

  29. arXiv:1602.03027  [pdf, ps, other

    stat.ML cs.LG

    Minimax Lower Bounds for Realizable Transductive Classification

    Authors: Ilya Tolstikhin, David Lopez-Paz

    Abstract: Transductive learning considers a training set of $m$ labeled samples and a test set of $u$ unlabeled samples, with the goal of best labeling that particular test set. Conversely, inductive learning considers a training set of $m$ labeled samples drawn iid from $P(X,Y)$, with the goal of best labeling any future samples drawn iid from $P(X)$. This comparison suggests that transduction is a much ea… ▽ More

    Submitted 9 February, 2016; originally announced February 2016.

  30. arXiv:1511.03643  [pdf, other

    stat.ML cs.LG

    Unifying distillation and privileged information

    Authors: David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik

    Abstract: Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, exten… ▽ More

    Submitted 25 February, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

    Journal ref: Proceedings of the International Conference on Learning Representations (2016) 1-10

  31. arXiv:1508.02933  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    No Regret Bound for Extreme Bandits

    Authors: Robert Nishihara, David Lopez-Paz, Léon Bottou

    Abstract: Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with s… ▽ More

    Submitted 11 April, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

    Comments: 11 pages, International Conference on Artificial Intelligence and Statistics, 2016

  32. arXiv:1402.0119  [pdf, other

    stat.ML cs.LG

    Randomized Nonlinear Component Analysis

    Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

    Abstract: Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have… ▽ More

    Submitted 13 May, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

    Comments: Appearing in ICML 2014

  33. arXiv:1301.0142  [pdf, other

    stat.ML cs.LG

    Semi-Supervised Domain Adaptation with Non-Parametric Copulas

    Authors: David Lopez-Paz, José Miguel Hernández-Lobato, Bernhard Schölkopf

    Abstract: A new framework based on the theory of copulas is proposed to address semi- supervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate cop- ula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Impor- tantly, we… ▽ More

    Submitted 1 January, 2013; originally announced January 2013.

    Comments: 9 pages, Appearing on Advances in Neural Information Processing Systems 25