Skip to main content

Showing 1–24 of 24 results for author: Bousquet, O

Searching in archive stat. Search in all archives.
.
  1. arXiv:2212.04216  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Differentially-Private Bayes Consistency

    Authors: Olivier Bousquet, Haim Kaplan, Aryeh Kontorovich, Yishay Mansour, Shay Moran, Menachem Sadigurschi, Uri Stemmer

    Abstract: We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed,… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  2. arXiv:2210.01513  [pdf, other

    cs.LG math.OC stat.ML

    The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

    Authors: Peter L. Bartlett, Philip M. Long, Olivier Bousquet

    Abstract: We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest c… ▽ More

    Submitted 11 April, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

  3. arXiv:2208.14615  [pdf, other

    cs.LG cs.CC stat.ML

    Fine-Grained Distribution-Dependent Learning Curves

    Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Jonathan Shafer, Ilya Tolstikhin

    Abstract: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled samples it receives from a target distribution. They are widely used as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. As observed by Antos and Lugosi (1996 , 1998), the classic `No Free Lunch' lower bounds only trace the upper envelope a… ▽ More

    Submitted 10 November, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

  4. arXiv:2011.04483  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    A Theory of Universal Learning

    Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir Yehudayoff

    Abstract: How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, d… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  5. arXiv:2006.10455  [pdf, other

    stat.ML cs.LG

    What Do Neural Networks Learn When Trained With Random Labels?

    Authors: Hartmut Maennel, Ibrahim Alabdulmohsin, Ilya Tolstikhin, Robert J. N. Baldock, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

    Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal c… ▽ More

    Submitted 11 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted, NeurIPS2020

  6. arXiv:2005.11818  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Proper Learning, Helly Number, and an Optimal SVM Bound

    Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Nikita Zhivotovskiy

    Abstract: The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/ε)$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C is achieved by a particular improper learning algorithm, which outputs a specific majorit… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  7. arXiv:2002.11448  [pdf, other

    stat.ML cs.LG

    Predicting Neural Network Accuracy from Weights

    Authors: Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya Tolstikhin

    Abstract: We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using simple statistics of the weights, the predictors are able to rank neural networks by their performance with very high accuracy (R2 score more than 0.9… ▽ More

    Submitted 9 April, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Updated the Small CNN Zoo dataset: reduced the maximal learning rate and got rid of multiple bad runs. Replaced all the experiments with the new numbers. Added MLP. Fixed typo in the abstract (R2 score instead of Kendall's tau). Added several earlier related works to the literature overview

  8. arXiv:1912.09713  [pdf, other

    cs.LG cs.CL stat.ML

    Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

    Authors: Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

    Abstract: State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence… ▽ More

    Submitted 25 June, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: Accepted for publication at ICLR 2020

  9. arXiv:1910.12756  [pdf, ps, other

    cs.LG math.ST stat.ML

    Fast classification rates without standard margin assumptions

    Authors: Olivier Bousquet, Nikita Zhivotovskiy

    Abstract: We consider the classical problem of learning rates for classes with finite VC dimension. It is well known that fast learning rates up to $O\left(\frac{d}{n}\right)$ are achievable by the empirical risk minimization algorithm (ERM) if low noise or margin assumptions are satisfied. These usually require the optimal Bayes classifier to be in the class, and it has been shown that when this is not the… ▽ More

    Submitted 26 October, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

    Comments: 29 pages, 1 figure; presentation changed according to referees suggestion

  10. arXiv:1910.07833  [pdf, ps, other

    cs.LG math.PR stat.ML

    Sharper bounds for uniformly stable algorithms

    Authors: Olivier Bousquet, Yegor Klochkov, Nikita Zhivotovskiy

    Abstract: Deriving generalization bounds for stable algorithms is a classical question in learning theory taking its roots in the early works by Vapnik and Chervonenkis (1974) and Rogers and Wagner (1978). In a series of recent breakthrough papers by Feldman and Vondrak (2018, 2019), it was shown that the best known high probability upper bounds for uniformly stable learning algorithms due to Bousquet and E… ▽ More

    Submitted 26 May, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: 17 pages, minor improvements, to appear in COLT

  11. arXiv:1910.04867  [pdf, other

    cs.CV cs.LG stat.ML

    A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

    Authors: Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

    Abstract: Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r… ▽ More

    Submitted 21 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  12. arXiv:1907.11180  [pdf, other

    cs.LG stat.ML

    Google Research Football: A Novel Reinforcement Learning Environment

    Authors: Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

    Abstract: Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator… ▽ More

    Submitted 14 April, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

  13. arXiv:1905.11866  [pdf, ps, other

    cs.LG stat.ML

    When can unlabeled data improve the learning rate?

    Authors: Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Ruth Urner

    Abstract: In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in order to produce a better classifier than with labeled data alone. However, the conditions under which such an improvement is possible are not fully understood… ▽ More

    Submitted 9 February, 2022; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: Small correction in proof of Theorem 1

    Journal ref: Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1500-1518, 2019

  14. arXiv:1905.11112  [pdf, other

    stat.ML cs.IT cs.LG

    Practical and Consistent Estimation of f-Divergences

    Authors: Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

    Abstract: The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and genera… ▽ More

    Submitted 24 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted to the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

    Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  15. arXiv:1905.10768  [pdf, other

    cs.LG stat.ML

    Precision-Recall Curves Using Information Divergence Frontiers

    Authors: Josip Djolonga, Mario Lucic, Marco Cuturi, Olivier Bachem, Olivier Bousquet, Sylvain Gelly

    Abstract: Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Recent developments have investigated metrics that quantify which parts of the true distribution is modeled well, and, on the contrary, what the model fails to capture, akin to precision and recall in info… ▽ More

    Submitted 8 June, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Updated to the AISTATS 2020 version

  16. arXiv:1902.05876  [pdf, other

    cs.LG cs.CC cs.IT math.PR stat.ML

    The Optimal Approximation Factor in Density Estimation

    Authors: Olivier Bousquet, Daniel Kane, Shay Moran

    Abstract: Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. A remarkable result due to Yatracos shows that this problem is tractable in the following sense: there exists an algorithm that uses $O(ε^{-2})$ samples from $p$ and outputs~$q_i$ such that with high probabili… ▽ More

    Submitted 2 April, 2020; v1 submitted 10 February, 2019; originally announced February 2019.

    Comments: fixed a coupkle of typos

  17. arXiv:1902.03468  [pdf, ps, other

    cs.LG stat.ML

    Synthetic Data Generators: Sequential and Private

    Authors: Olivier Bousquet, Roi Livni, Shay Moran

    Abstract: We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efficient). Previous work on synthetic data generators focused on the case that the query class $\mathcal{D}$ is finite and obtained sample complexity bounds th… ▽ More

    Submitted 7 December, 2020; v1 submitted 9 February, 2019; originally announced February 2019.

  18. arXiv:1806.00035  [pdf, other

    stat.ML cs.LG

    Assessing Generative Models via Precision and Recall

    Authors: Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

    Abstract: Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode drop**. However, these metrics are unable to distinguish between different failure cases since t… ▽ More

    Submitted 28 October, 2018; v1 submitted 31 May, 2018; originally announced June 2018.

    Comments: NIPS 2018

  19. arXiv:1803.08367  [pdf, other

    stat.ML cs.LG

    Gradient Descent Quantizes ReLU Network Features

    Authors: Hartmut Maennel, Olivier Bousquet, Sylvain Gelly

    Abstract: Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in t… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  20. arXiv:1711.10337  [pdf, other

    stat.ML cs.LG

    Are GANs Created Equal? A Large-Scale Study

    Authors: Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet

    Abstract: Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach… ▽ More

    Submitted 29 October, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: NIPS'18: Added a section on the limitations of the study and additional empirical results

  21. arXiv:1711.01558  [pdf, other

    stat.ML cs.LG

    Wasserstein Auto-Encoders

    Authors: Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf

    Abstract: We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution t… ▽ More

    Submitted 5 December, 2019; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: Published at ICLR 2018.. Included much wider hyperparameter sweep: in significant improvements in FIDs on CelebA

  22. arXiv:1705.08991  [pdf, other

    cs.LG stat.ML

    Approximation and Convergence Properties of Generative Adversarial Learning

    Authors: Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

    Abstract: Generative adversarial networks (GAN) approximate a target data distribution by jointly optimizing an objective function through a "two-player game" between a generator and a discriminator. Despite their empirical success, however, two very basic questions on how well they can approximate the target distribution remain unanswered. First, it is not known how restricting the discriminator family aff… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

  23. arXiv:1705.07642  [pdf, other

    stat.ML

    From optimal transport to generative modeling: the VEGAN cookbook

    Authors: Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Carl-Johann Simon-Gabriel, Bernhard Schoelkopf

    Abstract: We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constra… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

  24. arXiv:1701.02386  [pdf, other

    stat.ML cs.LG

    AdaGAN: Boosting Generative Models

    Authors: Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

    Abstract: Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every st… ▽ More

    Submitted 24 May, 2017; v1 submitted 9 January, 2017; originally announced January 2017.

    Comments: Updated with MNIST pictures and discussions + Unrolled GAN experiments