Skip to main content

Showing 1–16 of 16 results for author: Tolstikhin, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.14615  [pdf, other

    cs.LG cs.CC stat.ML

    Fine-Grained Distribution-Dependent Learning Curves

    Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Jonathan Shafer, Ilya Tolstikhin

    Abstract: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled samples it receives from a target distribution. They are widely used as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. As observed by Antos and Lugosi (1996 , 1998), the classic `No Free Lunch' lower bounds only trace the upper envelope a… ▽ More

    Submitted 10 November, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

  2. arXiv:2107.06825  [pdf, other

    cs.LG cs.CV

    A Generalized Lottery Ticket Hypothesis

    Authors: Ibrahim Alabdulmohsin, Larisa Markeeva, Daniel Keysers, Ilya Tolstikhin

    Abstract: We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters. We present evidence that the original results reported for the canonical basis continue to hold in this broader setting. We describe how structured pruning methods, including pruning units or factorizing fully-connected layers into p… ▽ More

    Submitted 26 July, 2021; v1 submitted 3 July, 2021; originally announced July 2021.

    Comments: Workshop on Sparsity in Neural Networks: Advancing Understanding and Practice (SNN'21). Updates: New curve on Figure 2(left) and discussion on Li et al

    MSC Class: 68T05 ACM Class: I.2.6; I.2.10

  3. arXiv:2105.01601  [pdf, other

    cs.CV cs.AI cs.LG

    MLP-Mixer: An all-MLP Architecture for Vision

    Authors: Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

    Abstract: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-… ▽ More

    Submitted 11 June, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: v2: Fixed parameter counts in Table 1. v3: Added results on JFT-3B in Figure 2(right); Added Section 3.4 on the input permutations. v4: Updated the x label in Figure 2(right)

  4. arXiv:2006.10455  [pdf, other

    stat.ML cs.LG

    What Do Neural Networks Learn When Trained With Random Labels?

    Authors: Hartmut Maennel, Ibrahim Alabdulmohsin, Ilya Tolstikhin, Robert J. N. Baldock, Olivier Bousquet, Sylvain Gelly, Daniel Keysers

    Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal c… ▽ More

    Submitted 11 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted, NeurIPS2020

  5. arXiv:2002.11448  [pdf, other

    stat.ML cs.LG

    Predicting Neural Network Accuracy from Weights

    Authors: Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya Tolstikhin

    Abstract: We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using simple statistics of the weights, the predictors are able to rank neural networks by their performance with very high accuracy (R2 score more than 0.9… ▽ More

    Submitted 9 April, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Updated the Small CNN Zoo dataset: reduced the maximal learning rate and got rid of multiple bad runs. Replaced all the experiments with the new numbers. Added MLP. Fixed typo in the abstract (R2 score instead of Kendall's tau). Added several earlier related works to the literature overview

  6. arXiv:1905.11866  [pdf, ps, other

    cs.LG stat.ML

    When can unlabeled data improve the learning rate?

    Authors: Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Ruth Urner

    Abstract: In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in order to produce a better classifier than with labeled data alone. However, the conditions under which such an improvement is possible are not fully understood… ▽ More

    Submitted 9 February, 2022; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: Small correction in proof of Theorem 1

    Journal ref: Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1500-1518, 2019

  7. arXiv:1905.11112  [pdf, other

    stat.ML cs.IT cs.LG

    Practical and Consistent Estimation of f-Divergences

    Authors: Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin

    Abstract: The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and genera… ▽ More

    Submitted 24 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted to the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

    Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  8. arXiv:1901.11015  [pdf, other

    q-bio.GN cs.LG stat.ML

    GeNet: Deep Representations for Metagenomics

    Authors: Mateo Rojas-Carulla, Ilya Tolstikhin, Guillermo Luque, Nicholas Youngblut, Ruth Ley, Bernhard Schölkopf

    Abstract: We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets obtained from several sequencing technologies, in which dataset shift occurs. We show that GeNet obtains competitive precision and good recall, w… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

  9. arXiv:1804.11130  [pdf, other

    cs.LG cs.AI stat.ML

    Competitive Training of Mixtures of Independent Deep Generative Models

    Authors: Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf

    Abstract: A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to capture the overall distribution or aspects thereof. Inspired by clustering approaches, we consider mixtures of implicit generative models that ``… ▽ More

    Submitted 3 March, 2019; v1 submitted 30 April, 2018; originally announced April 2018.

  10. arXiv:1802.03761  [pdf, other

    stat.ML cs.LG

    On the Latent Space of Wasserstein Auto-Encoders

    Authors: Paul K. Rubenstein, Bernhard Schoelkopf, Ilya Tolstikhin

    Abstract: We study the role of latent space dimensionality in Wasserstein auto-encoders (WAEs). Through experimentation on synthetic and real datasets, we argue that random encoders should be preferred over deterministic encoders. We highlight the potential of WAEs for representation learning with promising results on a benchmark disentanglement task.

    Submitted 11 February, 2018; originally announced February 2018.

  11. arXiv:1711.01558  [pdf, other

    stat.ML cs.LG

    Wasserstein Auto-Encoders

    Authors: Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Schoelkopf

    Abstract: We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution t… ▽ More

    Submitted 5 December, 2019; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: Published at ICLR 2018.. Included much wider hyperparameter sweep: in significant improvements in FIDs on CelebA

  12. arXiv:1706.10234  [pdf, other

    stat.ML cs.AI cs.LG

    Probabilistic Active Learning of Functions in Structural Causal Models

    Authors: Paul K. Rubenstein, Ilya Tolstikhin, Philipp Hennig, Bernhard Schoelkopf

    Abstract: We consider the problem of learning the functions computing children from parents in a Structural Causal Model once the underlying causal graph has been identified. This is in some sense the second step after causal discovery. Taking a probabilistic approach to estimating these functions, we derive a natural myopic active learning scheme that identifies the intervention which is optimally informat… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

    Comments: 9 pages main text + 4 pages supplement

  13. arXiv:1701.02386  [pdf, other

    stat.ML cs.LG

    AdaGAN: Boosting Generative Models

    Authors: Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

    Abstract: Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every st… ▽ More

    Submitted 24 May, 2017; v1 submitted 9 January, 2017; originally announced January 2017.

    Comments: Updated with MNIST pictures and discussions + Unrolled GAN experiments

  14. arXiv:1602.03027  [pdf, ps, other

    stat.ML cs.LG

    Minimax Lower Bounds for Realizable Transductive Classification

    Authors: Ilya Tolstikhin, David Lopez-Paz

    Abstract: Transductive learning considers a training set of $m$ labeled samples and a test set of $u$ unlabeled samples, with the goal of best labeling that particular test set. Conversely, inductive learning considers a training set of $m$ labeled samples drawn iid from $P(X,Y)$, with the goal of best labeling any future samples drawn iid from $P(X)$. This comparison suggests that transduction is a much ea… ▽ More

    Submitted 9 February, 2016; originally announced February 2016.

  15. arXiv:1505.02910  [pdf, ps, other

    stat.ML cs.LG

    Permutational Rademacher Complexity: a New Complexity Measure for Transductive Learning

    Authors: Ilya Tolstikhin, Nikita Zhivotovskiy, Gilles Blanchard

    Abstract: Transductive learning considers situations when a learner observes $m$ labelled training points and $u$ unlabelled test points with the final goal of giving correct answers for the test points. This paper introduces a new complexity measure for transductive learning called Permutational Rademacher Complexity (PRC) and studies its properties. A novel symmetrization inequality is proved, which shows… ▽ More

    Submitted 23 February, 2016; v1 submitted 12 May, 2015; originally announced May 2015.

    Comments: Corrected error in Inequality (1)

  16. arXiv:1411.7200  [pdf, ps, other

    stat.ML cs.LG

    Localized Complexities for Transductive Learning

    Authors: Ilya Tolstikhin, Gilles Blanchard, Marius Kloft

    Abstract: We show two novel concentration inequalities for suprema of empirical processes when sampling without replacement, which both take the variance of the functions into account. While these inequalities may potentially have broad applications in learning theory in general, we exemplify their significance by studying the transductive setting of learning theory. For which we provide the first excess ri… ▽ More

    Submitted 26 November, 2014; originally announced November 2014.

    Comments: Appeared in Conference on Learning Theory 2014