Skip to main content

Showing 1–50 of 182 results for author: Courville, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18043  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Multimodal foundation world models for generalist embodied agents

    Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron Courville, Sai Rajeswar

    Abstract: Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be functiona… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.17523  [pdf, other

    cs.LG cs.AI

    On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

    Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.14662  [pdf, other

    cs.LG

    Advantage Alignment Algorithms

    Authors: Juan Agustin Duque, Milad Aghajohari, Tim Cooijmans, Tianyu Zhang, Aaron Courville

    Abstract: The growing presence of artificially intelligent agents in everyday decision-making, from LLM assistants to autonomous vehicles, hints at a future in which conflicts may arise from each agent optimizing individual interests. In general-sum games these conflicts are apparent, where naive Reinforcement Learning agents get stuck in Pareto-suboptimal Nash equilibria. Consequently, opponent sha** has… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 20 Pages, 6 figures

  4. arXiv:2405.04342  [pdf, other

    cs.LG

    The Curse of Diversity in Ensemble-Based Exploration

    Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

    Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  5. arXiv:2405.01035  [pdf, other

    cs.GT cs.AI cs.LG

    LOQA: Learning with Opponent Q-Learning Awareness

    Authors: Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville

    Abstract: In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opp… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: accepted to ICLR but still not in proceedings https://openreview.net/forum?id=FDQF6A1s6M

  6. arXiv:2405.00740  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Modeling Caption Diversity in Contrastive Vision-Language Pretraining

    Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

    Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v… ▽ More

    Submitted 14 May, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

  7. arXiv:2404.15721  [pdf, other

    cs.CV cs.AI

    SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

    Authors: Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

    Abstract: Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often f… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  8. arXiv:2404.06519  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Best Response Sha**

    Authors: Milad Aghajohari, Tim Cooijmans, Juan Agustin Duque, Shunichi Akatsuka, Aaron Courville

    Abstract: We investigate the challenge of multi-agent deep reinforcement learning in partially competitive environments, where traditional methods struggle to foster reciprocity-based cooperation. LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent. However, there is a key limitation in these techniques. Because t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  9. arXiv:2403.08245  [pdf, other

    cs.LG cs.DC

    Scattered Mixture-of-Experts Implementation

    Authors: Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

    Abstract: We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and overcoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  10. arXiv:2402.12479  [pdf, other

    cs.LG cs.AI

    In value-based deep reinforcement learning, a pruned network is a good network

    Authors: Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

    Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  11. arXiv:2402.06457  [pdf, other

    cs.LG cs.AI cs.CL

    V-STaR: Training Verifiers for Self-Taught Reasoners

    Authors: Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

    Abstract: Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  12. arXiv:2312.07551  [pdf, other

    cs.CL

    Language Model Alignment with Elastic Reset

    Authors: Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

    Abstract: Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how differ… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Published at NeurIPS 2023

  13. arXiv:2311.17894  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cs.LG

    Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

    Authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

    Abstract: We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  14. arXiv:2310.18777  [pdf, other

    cs.LG cs.AI

    Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

    Authors: Yi Ren, Samuel Lavoie, Mikhail Galkin, Danica J. Sutherland, Aaron Courville

    Abstract: Compositional generalization, the ability of an agent to generalize to unseen combinations of latent factors, is easy for humans but hard for deep neural networks. A line of research in cognitive science has hypothesized a process, ``iterated learning,'' to help explain how human language developed this ability; the theory rests on simultaneous pressures towards compressibility (when an ignorant a… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  15. arXiv:2310.18555  [pdf, other

    cs.LG

    Group Robust Classification Without Any Group Information

    Authors: Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, Aaron Courville

    Abstract: Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bia… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Code is available at https://github.com/tsirif/uLA

  16. arXiv:2310.07096  [pdf, other

    cs.CL cs.AI

    Sparse Universal Transformer

    Authors: Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

    Abstract: The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers. Empirical evidence shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, scaling UT parameters is much more compute and memory int… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  17. arXiv:2310.02679  [pdf, other

    cs.LG cs.AI stat.CO stat.ME stat.ML

    Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

    Authors: Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio

    Abstract: We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajector… ▽ More

    Submitted 9 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024

  18. arXiv:2307.08863  [pdf, other

    cs.LG cs.MA

    Meta-Value Learning: a General Framework for Learning with Learning Awareness

    Authors: Tim Cooijmans, Milad Aghajohari, Aaron Courville

    Abstract: Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum… ▽ More

    Submitted 11 December, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  19. arXiv:2305.19452  [pdf, other

    cs.LG cs.AI

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICML 2023, revised version

  20. arXiv:2305.17010  [pdf, other

    cs.LG cs.AI cs.DM stat.ML

    Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

    Authors: Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan

    Abstract: Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to apply machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from com… ▽ More

    Submitted 20 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023 as spotlight

  21. arXiv:2302.05793  [pdf, other

    cs.LG cs.AI stat.CO stat.ML

    Distributional GFlowNets with Quantile Flows

    Authors: Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio

    Abstract: Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In thi… ▽ More

    Submitted 17 February, 2024; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted by TMLR

  22. arXiv:2302.00695  [pdf, other

    cs.LG hep-ex hep-ph stat.ML

    Versatile Energy-Based Probabilistic Models for High Energy Physics

    Authors: Taoli Cheng, Aaron Courville

    Abstract: As a classical generative modeling approach, energy-based models have the natural advantage of flexibility in the form of the energy function. Recently, energy-based models have achieved great success in modeling high-dimensional data in computer vision and natural language processing. In line with these advancements, we build a multi-purpose energy-based probabilistic model for High Energy Physic… ▽ More

    Submitted 18 January, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 17 pages, 9 figures. NeurIPS 2023 camera ready

  23. arXiv:2211.09066  [pdf, other

    cs.LG cs.AI cs.CL

    Teaching Algorithmic Reasoning via In-context Learning

    Authors: Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

    Abstract: Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. 2022 showed that even simple algorithmic reasoning tasks such… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  24. arXiv:2211.08473  [pdf, other

    cs.CL cs.LG

    On the Compositional Generalization Gap of In-Context Learning

    Authors: Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

    Abstract: Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abilities. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  25. arXiv:2210.03308  [pdf, other

    cs.LG

    Generative Augmented Flow Networks

    Authors: Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio

    Abstract: The Generative Flow Network is a probabilistic framework where an agent learns a stochastic policy for object generation, such that the probability of generating an object is proportional to a given reward function. Its effectiveness has been shown in discovering high-quality and diverse solutions, compared to reward-maximizing reinforcement learning-based methods. Nonetheless, GFlowNets only lear… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  26. arXiv:2210.00999  [pdf, other

    cs.LG cs.AI stat.ML

    Latent State Marginalization as a Low-cost Approach for Improving Exploration

    Authors: Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen

    Abstract: While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework… ▽ More

    Submitted 10 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by ICLR 2023

  27. arXiv:2209.12016  [pdf, other

    cs.AI cs.LG

    Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

    Authors: Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Piché, Bart Dhoedt, Aaron Courville, Alexandre Lacoste

    Abstract: Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; L… ▽ More

    Submitted 24 May, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted at ICML 2023 (oral)

  28. arXiv:2208.07949  [pdf, other

    cs.LG

    Riemannian Diffusion Models

    Authors: Chin-Wei Huang, Milad Aghajohari, Avishek Joey Bose, Prakash Panangaden, Aaron Courville

    Abstract: Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in gen… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  29. arXiv:2206.15276  [pdf, other

    cs.SD cs.LG eess.AS

    R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

    Authors: Kyle Kastner, Aaron Courville

    Abstract: This paper introduces R-MelNet, a two-part autoregressive architecture with a frontend based on the first tier of MelNet and a backend WaveRNN-style audio decoder for neural text-to-speech synthesis. Taking as input a mixed sequence of characters and phonemes, with an optional audio priming sequence, this model produces low-resolution mel-spectral features which are interpolated and used by a Wave… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  30. arXiv:2206.03362  [pdf, other

    cs.LG cs.AI cs.CR stat.ME stat.ML

    Building Robust Ensembles via Margin Boosting

    Authors: Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

    Abstract: In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles.… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted by ICML 2022

  31. arXiv:2206.01626  [pdf, other

    cs.LG cs.AI stat.ML

    Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

  32. arXiv:2206.01251  [pdf, other

    cs.LG cs.AI cs.CV

    Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

    Authors: Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

    Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressivene… ▽ More

    Submitted 14 November, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Journal ref: TMLR 2023 -- Transactions of Machine Learning Research, 11/2023

  33. arXiv:2206.00735  [pdf, other

    cs.CV cs.LG

    Cascaded Video Generation for Videos In-the-Wild

    Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville

    Abstract: Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted to the 26th International Conference on Pattern Recognition (ICPR 2022). arXiv admin note: substantial text overlap with arXiv:2106.02719

  34. arXiv:2205.07802  [pdf, other

    cs.LG cs.AI stat.ML

    The Primacy Bias in Deep Reinforcement Learning

    Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

    Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effec… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

  35. arXiv:2204.00616  [pdf, other

    cs.LG cs.CV

    Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

    Authors: Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville

    Abstract: Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation. This procedure conditions the representation onto a constrained space during pretraining and imparts an inductive bias for group sparsity. For downstream classification, we formally prove that the… ▽ More

    Submitted 30 September, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: 30 pages, 8 figures, Preprint

  36. arXiv:2202.01361  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks for Discrete Probabilistic Modeling

    Authors: Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

    Abstract: We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Building upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets ca… ▽ More

    Submitted 8 June, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted by ICML 2022

  37. arXiv:2202.00155  [pdf, other

    cs.LG cs.AI cs.NE

    Fortuitous Forgetting in Connectionist Networks

    Authors: Hattie Zhou, Ankit Vani, Hugo Larochelle, Aaron Courville

    Abstract: Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for sha** the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the rele… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

    Comments: ICLR Camera Ready

    Journal ref: ICLR 2022

  38. arXiv:2201.07199  [pdf, other

    hep-ph cs.LG hep-ex

    Invariant Representation Driven Neural Classifier for Anti-QCD Jet Tagging

    Authors: Taoli Cheng, Aaron Courville

    Abstract: We leverage representation learning and the inductive bias in neural-net-based Standard Model jet classification tasks, to detect non-QCD signal jets. In establishing the framework for classification-based anomaly detection in jet physics, we demonstrate that, with a \emph{well-calibrated} and \emph{powerful enough feature extractor}, a well-trained \emph{mass-decorrelated} supervised Standard Mod… ▽ More

    Submitted 17 October, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: 32 pages, 15 figures. To appear in the Journal of High Energy Physics

    Journal ref: JHEP 10 (2022) 152

  39. arXiv:2112.09312  [pdf, other

    cs.SD cs.LG eess.AS

    MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

    Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

    Abstract: Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments… ▽ More

    Submitted 17 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2022

  40. arXiv:2112.04716  [pdf, other

    cs.LG

    DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

    Authors: Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

    Abstract: Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that de… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  41. arXiv:2111.12172  [pdf, other

    cs.CV cs.AI cs.LG

    Multi-label Iterated Learning for Image Classification with Label Ambiguity

    Authors: Sai Rajeswar, Pau Rodriguez, Soumye Singhal, David Vazquez, Aaron Courville

    Abstract: Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data.… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  42. arXiv:2110.10139  [pdf, other

    eess.AS cs.SD

    Chunked Autoregressive GAN for Conditional Waveform Synthesis

    Authors: Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

    Abstract: Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. Ho… ▽ More

    Submitted 3 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  43. arXiv:2110.03372  [pdf, other

    cs.LG cs.AI q-bio.BM stat.ME stat.ML

    Unifying Likelihood-free Inference with Black-box Optimization and Beyond

    Authors: Dinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville

    Abstract: Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on the pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box optimization, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods… ▽ More

    Submitted 8 February, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 spotlight

  44. arXiv:2109.11052  [pdf, other

    cs.LG

    On Bonus-Based Exploration Methods in the Arcade Learning Environment

    Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

    Abstract: Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-base… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Full version of arXiv:1908.02388

    Journal ref: Published as a conference paper at ICLR 2020

  45. arXiv:2108.13264  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Deep Reinforcement Learning at the Edge of the Statistical Precipice

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More

    Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

  46. arXiv:2106.04799  [pdf, other

    cs.LG

    Pretraining Representations for Data-Efficient Reinforcement Learning

    Authors: Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman, Aaron Courville

    Abstract: Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited t… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  47. arXiv:2106.02890  [pdf, other

    cs.LG stat.ML

    Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

    Authors: Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang, Aaron Courville

    Abstract: Can models with particular structure avoid being biased towards spurious correlation in out-of-distribution (OOD) generalization? Peters et al. (2016) provides a positive answer for linear cases. In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting. We demonstrate that even in biased models (which focus on spurious correlation) there still ex… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: Accepted to ICML2021 as long talk

  48. arXiv:2106.02808  [pdf, other

    cs.LG

    A Variational Perspective on Diffusion-Based Generative Models and Score Matching

    Authors: Chin-Wei Huang, Jae Hyun Lim, Aaron Courville

    Abstract: Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an… ▽ More

    Submitted 29 September, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

  49. arXiv:2106.02719  [pdf, other

    cs.CV

    Hierarchical Video Generation for Complex Data

    Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville

    Abstract: Videos can often be created by first outlining a global description of the scene and then adding local details. Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, that is then refined by subsequent levels in the hierarchy. We train each level in our… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  50. arXiv:2105.03519  [pdf, other

    cs.CL

    Understanding by Understanding Not: Modeling Negation in Language Models

    Authors: Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

    Abstract: Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the r… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.