Skip to main content

Showing 1–50 of 82 results for author: Larochelle, H

.
  1. arXiv:2404.11018  [pdf, other

    cs.LG cs.AI cs.CL

    Many-Shot In-Context Learning

    Authors: Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

    Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative… ▽ More

    Submitted 22 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  2. arXiv:2311.15268  [pdf, other

    cs.LG cs.AI

    Unlearning via Sparse Representations

    Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

    Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  3. arXiv:2311.14115  [pdf, other

    cs.LG cs.AI cs.CL

    A density estimation perspective on learning from pairwise human preferences

    Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

    Abstract: Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted… ▽ More

    Submitted 10 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  4. arXiv:2311.00936  [pdf, other

    cs.LG cs.CV q-bio.PE

    SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

    Authors: Mélisande Teng, Amna Elmustafa, Benjamin Akera, Yoshua Bengio, Hager Radi Abdelwahed, Hugo Larochelle, David Rolnick

    Abstract: Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographi… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

  5. arXiv:2305.01079  [pdf, other

    cs.CV

    Bird Distribution Modelling using Remote Sensing and Citizen Science data

    Authors: Mélisande Teng, Amna Elmustafa, Benjamin Akera, Hugo Larochelle, David Rolnick

    Abstract: Climate change is a major driver of biodiversity loss, changing the geographic range and abundance of many species. However, there remain significant knowledge gaps about the distribution of species, due principally to the amount of effort and expertise required for traditional field monitoring. We propose an approach leveraging computer vision to improve species distribution modelling, combining… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Journal ref: Tackling Climate Change with Machine Learning Workshop, 11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda

  6. arXiv:2211.09066  [pdf, other

    cs.LG cs.AI cs.CL

    Teaching Algorithmic Reasoning via In-context Learning

    Authors: Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

    Abstract: Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. 2022 showed that even simple algorithmic reasoning tasks such… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  7. arXiv:2206.12839  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    Repository-Level Prompt Generation for Large Language Models of Code

    Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

    Abstract: With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take co… ▽ More

    Submitted 5 June, 2023; v1 submitted 26 June, 2022; originally announced June 2022.

    Comments: ICML 2023 (Camera-Ready version)

    Journal ref: ICML, 2023

  8. arXiv:2204.00949  [pdf, other

    cs.CV

    Matching Feature Sets for Few-Shot Image Classification

    Authors: Arman Afrasiyabi, Hugo Larochelle, Jean-François Lalonde, Christian Gagné

    Abstract: In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: International Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  9. arXiv:2203.03771  [pdf, other

    cs.LG cs.PL

    Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions

    Authors: David Bieber, Rishab Goel, Daniel Zheng, Hugo Larochelle, Daniel Tarlow

    Abstract: The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runti… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 20 pages, 7 figures

  10. arXiv:2202.00155  [pdf, other

    cs.LG cs.AI cs.NE

    Fortuitous Forgetting in Connectionist Networks

    Authors: Hattie Zhou, Ankit Vani, Hugo Larochelle, Aaron Courville

    Abstract: Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for sha** the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the rele… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

    Comments: ICLR Camera Ready

    Journal ref: ICLR 2022

  11. arXiv:2201.03529  [pdf, other

    cs.LG cs.CV

    Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

    Authors: Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

    Abstract: Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source mo… ▽ More

    Submitted 25 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: presented at ICML 2022 (Oral)

    Journal ref: ICML 2022, Proceedings of the 39th International Conference on Machine Learning

  12. arXiv:2108.03489  [pdf, other

    cs.CV cs.LG

    Impact of Aliasing on Generalization in Deep Convolutional Networks

    Authors: Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Rob Romijnders, Nicolas Le Roux, Ross Goroshin

    Abstract: We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are unable to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each… ▽ More

    Submitted 7 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021. arXiv admin note: text overlap with arXiv:2011.10675

  13. arXiv:2106.07175  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    Learning to Combine Per-Example Solutions for Neural Program Synthesis

    Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

    Abstract: The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solu… ▽ More

    Submitted 1 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 (camera-ready version)

  14. arXiv:2105.07029  [pdf, other

    cs.LG cs.CV

    Learning a Universal Template for Few-shot Dataset Generalization

    Authors: Eleni Triantafillou, Hugo Larochelle, Richard Zemel, Vincent Dumoulin

    Abstract: Few-shot dataset generalization is a challenging variant of the well-studied few-shot classification problem where a diverse training set of several datasets is given, for the purpose of training an adaptable model that can then learn classes from new datasets using only a few examples. To this end, we propose to utilize the diverse training set to construct a universal template: a partial model t… ▽ More

    Submitted 21 June, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

  15. arXiv:2104.02638  [pdf, other

    cs.LG cs.CV

    Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

    Authors: Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

    Abstract: Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer a… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  16. arXiv:2103.01616  [pdf, other

    cs.CL cs.LG

    Interpretable Multi-Modal Hate Speech Detection

    Authors: Prashanth Vijayaraghavan, Hugo Larochelle, Deb Roy

    Abstract: With growing role of social media in sha** public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 5 pages, Accepted at the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, United States, 2019

    Journal ref: ICML Workshop on AI for Social Good, 2019

  17. arXiv:2102.00863  [pdf, other

    cs.CV

    Self-Supervised Equivariant Scene Synthesis from Video

    Authors: Cinjon Resnick, Or Litany, Cosmas Heiß, Hugo Larochelle, Joan Bruna, Kyunghyun Cho

    Abstract: We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encod… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2011.05787

  18. arXiv:2011.10675  [pdf, other

    cs.CV

    An Effective Anti-Aliasing Approach for Residual Networks

    Authors: Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Nicolas Le Roux, Ross Goroshin

    Abstract: Image pre-processing in the frequency domain has traditionally played a vital role in computer vision and was even part of the standard pipeline in the early days of deep learning. However, with the advent of large datasets, many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself. Frequency aliasing is a phenomenon that may occ… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  19. arXiv:2011.05787  [pdf, other

    cs.CV

    Learned Equivariant Rendering without Transformation Supervision

    Authors: Cinjon Resnick, Or Litany, Hugo Larochelle, Joan Bruna, Kyunghyun Cho

    Abstract: We propose a self-supervised framework to learn scene representations from video that are automatically delineated into objects and background. Our method relies on moving objects being equivariant with respect to their transformation across frames and the background being constant. After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, trans… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Workshop on Differentiable Vision, Graphics, and Physics in Machine Learning at NeurIPS 2020

  20. arXiv:2010.12621  [pdf, other

    cs.LG

    Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks

    Authors: David Bieber, Charles Sutton, Hugo Larochelle, Daniel Tarlow

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural n… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NeurIPS 2020

  21. arXiv:2007.06700  [pdf, other

    cs.LG stat.ML

    Revisiting Fundamentals of Experience Replay

    Authors: William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

    Abstract: Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and a… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Published at ICML 2020. First two authors contributed equally and code available at https://github.com/google-research/google-research/tree/master/experience_replay

  22. arXiv:2007.04929  [pdf, other

    cs.LG stat.ML

    Learning Graph Structure With A Finite-State Automaton Layer

    Authors: Daniel D. Johnson, Hugo Larochelle, Daniel Tarlow

    Abstract: Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream… ▽ More

    Submitted 6 November, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted at NeurIPS 2020 (spotlight)

  23. arXiv:2006.16524  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Uniform Priors for Data-Efficient Transfer

    Authors: Samarth Sinha, Karsten Roth, Anirudh Goyal, Marzyeh Ghassemi, Hugo Larochelle, Animesh Garg

    Abstract: Deep Neural Networks have shown great promise on a variety of downstream applications; but their ability to adapt and generalize to new data and tasks remains a challenge. However, the ability to perform few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore crucial to understand what makes for good, transfer-able feat… ▽ More

    Submitted 13 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

  24. arXiv:2006.11702  [pdf, other

    cs.LG cs.CV stat.ML

    A Universal Representation Transformer Layer for Few-Shot Image Classification

    Authors: Lu Liu, William Hamilton, Guodong Long, **g Jiang, Hugo Larochelle

    Abstract: Few-shot classification aims to recognize unseen classes when presented with only a small number of samples. We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting i… ▽ More

    Submitted 2 September, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

  25. arXiv:2003.12206  [pdf, other

    cs.LG stat.ML

    Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

    Authors: Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d'Alché-Buc, Emily Fox, Hugo Larochelle

    Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible res… ▽ More

    Submitted 30 December, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: To appear at JMLR, 16 pages + Appendix

  26. arXiv:2003.11768   

    cs.LG cs.AI cs.SE stat.ML

    On-the-Fly Adaptation of Source Code Models using Meta-Learning

    Authors: Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

    Abstract: The ability to adapt to unseen, local contexts is an important challenge that successful models of source code must overcome. One of the most popular approaches for the adaptation of such models is dynamic evaluation. With dynamic evaluation, when running a model on an unseen file, the model is updated immediately after having observed each token in that file. In this work, we propose instead to f… ▽ More

    Submitted 19 September, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: This paper has been withdrawn because we found a bug in the FOMAML implementation that invalidates some of the key claims in the paper

  27. arXiv:2003.06060  [pdf, other

    cs.LG cs.AI stat.ML

    Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

    Authors: Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio

    Abstract: We show that the sum of the implicit generator log-density $\log p_g$ of a GAN with the logit score of the discriminator defines an energy function which yields the true data density when the generator is imperfect but the discriminator is optimal, thus making it possible to improve on the typical generator (with implicit density $p_g$). To make that practical, we show that sampling from this modi… ▽ More

    Submitted 7 July, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

  28. arXiv:2003.04514  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Diversity inducing Information Bottleneck in Model Ensembles

    Authors: Samarth Sinha, Homanga Bharadhwaj, Anirudh Goyal, Hugo Larochelle, Animesh Garg, Florian Shkurti

    Abstract: Although deep learning models have achieved state-of-the-art performance on a number of vision tasks, generalization over high dimensional multi-modal data, and reliable predictive uncertainty estimation are still active areas of research. Bayesian approaches including Bayesian Neural Nets (BNNs) do not scale well to modern computer vision tasks, as they are difficult to train, and have poor gener… ▽ More

    Submitted 8 December, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: AAAI 2021. Samarth Sinha* and Homanga Bharadhwaj* contributed equally to this work

  29. arXiv:2003.01367  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Curriculum By Smoothing

    Authors: Samarth Sinha, Animesh Garg, Hugo Larochelle

    Abstract: Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. Moreover, recent work in Generative Adversarial Networks (GANs) has highlighted the importance of learning by progressively increasing the difficulty of a learning task [26]. When learning a network from scratch, the information propagated withi… ▽ More

    Submitted 4 January, 2021; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: NeurIPS 2020 (Spotlight)

  30. arXiv:2002.12499  [pdf, other

    cs.LG cs.AI stat.ML

    On Catastrophic Interference in Atari 2600 Games

    Authors: William Fedus, Dibya Ghosh, John D. Martin, Marc G. Bellemare, Yoshua Bengio, Hugo Larochelle

    Abstract: Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interference within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network c… ▽ More

    Submitted 9 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: First two authors contributed equally. Code available to reproduce experiments at https://github.com/google-research/google-research/tree/master/memento

  31. arXiv:1911.12511  [pdf, other

    cs.AI cs.LG

    Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction

    Authors: Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare

    Abstract: Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learnin… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

    Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Accepted for Oral presentation

  32. arXiv:1910.13540  [pdf, ps, other

    stat.ML cs.LG

    Small-GAN: Speeding Up GAN Training Using Core-sets

    Authors: Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena

    Abstract: Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of C… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  33. arXiv:1910.01075  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Neural Causal Models from Unknown Interventions

    Authors: Nan Rosemary Ke, Olexa Bilaniuk, Anirudh Goyal, Stefan Bauer, Hugo Larochelle, Bernhard Schölkopf, Michael C. Mozer, Chris Pal, Yoshua Bengio

    Abstract: Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained from observational data alone. Interventional data provides much richer information about the underlying data-generating process. However, the… ▽ More

    Submitted 23 August, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  34. arXiv:1903.07714  [pdf, other

    cs.LG stat.ML

    A RAD approach to deep mixture models

    Authors: Laurent Dinh, Jascha Sohl-Dickstein, Hugo Larochelle, Razvan Pascanu

    Abstract: Flow based models such as Real NVP are an extremely powerful approach to density estimation. However, existing flow based models are restricted to transforming continuous densities over a continuous input space into similarly continuous distributions over continuous latent variables. This makes them poorly suited for modeling and representing discrete structures in data distributions, for example… ▽ More

    Submitted 25 August, 2020; v1 submitted 18 March, 2019; originally announced March 2019.

    Comments: 18.5 pages of main content, 3 pages of appendices

  35. arXiv:1903.03096  [pdf, other

    cs.LG stat.ML

    Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

    Authors: Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

    Abstract: Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle it, we find the procedure and datasets that are used to assess their progress lacking. To address this limitation, we propose Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and prese… ▽ More

    Submitted 8 April, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: Code available at https://github.com/google-research/meta-dataset

    Journal ref: International Conference on Learning Representations (2020)

  36. arXiv:1902.08605  [pdf, other

    cs.LG stat.ML

    Are Few-Shot Learning Benchmarks too Simple ? Solving them without Task Supervision at Test-Time

    Authors: Gabriel Huang, Hugo Larochelle, Simon Lacoste-Julien

    Abstract: We show that several popular few-shot learning benchmarks can be solved with varying degrees of success without using support set Labels at Test-time (LT). To this end, we introduce a new baseline called Centroid Networks, a modification of Prototypical Networks in which the support set labels are hidden from the method at test-time and have to be recovered through clustering. A benchmark that can… ▽ More

    Submitted 24 July, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

  37. arXiv:1902.06865  [pdf, other

    stat.ML cs.LG

    Hyperbolic Discounting and Learning over Multiple Horizons

    Authors: William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle

    Abstract: Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic time-preferences. In this work we re… ▽ More

    Submitted 28 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

  38. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  39. arXiv:1901.10902  [pdf, other

    stat.ML cs.LG

    InfoBot: Transfer and Exploration via the Information Bottleneck

    Authors: Anirudh Goyal, Riashat Islam, Daniel Strouse, Zafarali Ahmed, Matthew Botvinick, Hugo Larochelle, Yoshua Bengio, Sergey Levine

    Abstract: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We p… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: Accepted at ICLR'19

  40. arXiv:1811.05013  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Blindfold Baselines for Embodied QA

    Authors: Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville

    Abstract: We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate sol… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: NIPS 2018 Visually-Grounded Interaction and Language (ViGilL) Workshop

  41. arXiv:1811.02549  [pdf, other

    cs.CL cs.LG

    Language GANs Falling Short

    Authors: Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin

    Abstract: Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction inste… ▽ More

    Submitted 19 February, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

    Journal ref: ICLR 2020 - Proceedings of the Seventh International Conference on Learning Representation

  42. arXiv:1804.00379  [pdf, other

    cs.LG stat.ML

    Recall Traces: Backtracking Models for Efficient Reinforcement Learning

    Authors: Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

    Abstract: In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate a… ▽ More

    Submitted 28 January, 2019; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: Accepted at ICLR 2019

  43. arXiv:1803.00676  [pdf, other

    cs.LG cs.CV stat.ML

    Meta-Learning for Semi-Supervised Few-Shot Classification

    Authors: Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel

    Abstract: In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its correspon… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Comments: Published as a conference paper at ICLR 2018. 15 pages

  44. arXiv:1802.09484  [pdf, other

    stat.ML cs.LG

    Disentangling the independently controllable factors of variation by interacting with the world

    Authors: Valentin Thomas, Emmanuel Bengio, William Fedus, Jules Pondard, Philippe Beaudoin, Hugo Larochelle, Joelle Pineau, Doina Precup, Yoshua Bengio

    Abstract: It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to int… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

    Comments: Presented at NIPS 2017 Learning Disentangling Representations Workshop

  45. arXiv:1711.11017  [pdf, other

    cs.AI cs.CL cs.CV cs.RO cs.SD eess.AS

    HoME: a Household Multimodal Environment

    Authors: Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

    Abstract: We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017's Visually-Grounded Interaction and Language Workshop

  46. arXiv:1707.00762  [pdf, ps, other

    stat.ML cs.LG

    Multiscale sequence modeling with a learned dictionary

    Authors: Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio

    Abstract: We propose a generalization of neural network sequence models. Instead of predicting one symbol at a time, our multi-scale model makes predictions over multiple, potentially overlap** multi-symbol tokens. A variation of the byte-pair encoding (BPE) compression algorithm is used to learn the dictionary of tokens that the model is trained with. When applied to language modelling, our model has the… ▽ More

    Submitted 5 July, 2017; v1 submitted 3 July, 2017; originally announced July 2017.

  47. arXiv:1707.00683  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating early visual processing by language

    Authors: Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

    Abstract: It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and pro… ▽ More

    Submitted 18 December, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

    Comments: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  48. arXiv:1611.08481  [pdf, other

    cs.AI cs.CV

    GuessWhat?! Visual object discovery through multi-modal dialogue

    Authors: Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

    Abstract: We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the colle… ▽ More

    Submitted 6 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: 23 pages; CVPR 2017 submission; see https://guesswhat.ai

  49. arXiv:1610.02365  [pdf, other

    physics.optics physics.comp-ph

    Deep Learning with Coherent Nanophotonic Circuits

    Authors: Yichen Shen, Nicholas C. Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, Marin Soljacic

    Abstract: Artificial Neural Networks are computational network models inspired by signal processing in the brain. These models have dramatically improved the performance of many learning tasks, including speech and object recognition. However, today's computing hardware is inefficient at implementing neural networks, in large part because much of it was designed for von Neumann computing schemes. Significan… ▽ More

    Submitted 7 October, 2016; originally announced October 2016.

    Comments: 8 pages, 3 figures

  50. Deep learning trends for focal brain pathology segmentation in MRI

    Authors: Mohammad Havaei, Nicolas Guizard, Hugo Larochelle, Pierre-Marc Jodoin

    Abstract: Segmentation of focal (localized) brain pathologies such as brain tumors and brain lesions caused by multiple sclerosis and ischemic strokes are necessary for medical diagnosis, surgical planning and disease development as well as other applications such as tractography. Over the years, attempts have been made to automate this process for both clinical and research reasons. In this regard, machine… ▽ More

    Submitted 23 January, 2017; v1 submitted 18 July, 2016; originally announced July 2016.

    Comments: Published in Machine Learning for Health Informatics