Skip to main content

Showing 1–43 of 43 results for author: Ballas, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00740  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Modeling Caption Diversity in Contrastive Vision-Language Pretraining

    Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

    Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v… ▽ More

    Submitted 14 May, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

  2. arXiv:2404.08471  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Feature Prediction for Learning Visual Representations from Video

    Authors: Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, Nicolas Ballas

    Abstract: This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datase… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  3. arXiv:2403.00504  [pdf, other

    cs.CV cs.AI cs.LG

    Learning and Leveraging World Models in Visual Representation Learning

    Authors: Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun

    Abstract: Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict t… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 23 pages, 16 figures

  4. arXiv:2312.12423  [pdf, other

    cs.CV cs.AI

    Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

    Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

    Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More

    Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight

  5. arXiv:2309.16748  [pdf, other

    cs.LG cs.AI stat.ML

    Discovering environments with XRM

    Authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz

    Abstract: Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generaliz… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  6. arXiv:2308.00566  [pdf, other

    cs.CV cs.AI cs.LG

    Stochastic positional embeddings improve masked image modeling

    Authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun

    Abstract: Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determi… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: Code and models available in https://github.com/amirbar/StoP

  7. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  8. arXiv:2304.05369  [pdf, other

    cs.LG

    A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation

    Authors: Florian Bordes, Samuel Lavoie, Randall Balestriero, Nicolas Ballas, Pascal Vincent

    Abstract: Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretraining bias. A commonly used trick in SSL, shown to make deep networks more robust to such bias, is the addition of a small projector (usually a 2 or 3 layer mul… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  9. arXiv:2301.09451  [pdf, other

    cs.CV cs.AI cs.LG

    A Simple Recipe for Competitive Low-compute Self supervised Vision Models

    Authors: Quentin Duval, Ishan Misra, Nicolas Ballas

    Abstract: Self-supervised methods in vision have been mostly focused on large architectures as they seem to suffer from a significant performance drop for smaller architectures. In this paper, we propose a simple self-supervised distillation technique that can train high performance low-compute neural networks. Our main insight is that existing joint-embedding based SSL methods can be repurposed for knowled… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

  10. arXiv:2301.08243  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

    Authors: Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas

    Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target block… ▽ More

    Submitted 13 April, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 2023 IEEE/CVF International Conference on Computer Vision

  11. arXiv:2212.05195  [pdf, other

    cs.LG

    Uniform Masking Prevails in Vision-Language Pretraining

    Authors: Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas, Madian Khabsa, Amjad Almahairi

    Abstract: Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining. To implement MLM, the researcher must make two design choices: the masking strategy, which determines which tokens to mask, and the masking rate, which determines how many tokens to mask. Previous work has focused primarily on the masking strategy while setting the masking rate at a default… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  12. arXiv:2211.01866  [pdf, other

    cs.CV cs.LG

    ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

    Authors: Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

    Abstract: Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annot… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  13. arXiv:2210.08031  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Neural Attentive Circuits

    Authors: Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

    Abstract: Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data us… ▽ More

    Submitted 19 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: To appear at NeurIPS 2022

  14. arXiv:2210.07277  [pdf, other

    cs.LG cs.AI cs.CV

    The Hidden Uniform Cluster Prior in Self-Supervised Learning

    Authors: Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

    Abstract: A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-bal… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  15. arXiv:2206.00735  [pdf, other

    cs.CV cs.LG

    Cascaded Video Generation for Videos In-the-Wild

    Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville

    Abstract: Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted to the 26th International Conference on Pattern Recognition (ICPR 2022). arXiv admin note: substantial text overlap with arXiv:2106.02719

  16. arXiv:2204.07141  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Masked Siamese Networks for Label-Efficient Learning

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas

    Abstract: We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  17. arXiv:2201.00072  [pdf, other

    cs.LG

    BARACK: Partially Supervised Group Robustness With Guarantees

    Authors: Nimit S. Sohoni, Maziar Sanjabi, Nicolas Ballas, Aditya Grover, Shaoliang Nie, Hamed Firooz, Christopher Ré

    Abstract: While neural networks have shown remarkable success on classification tasks in terms of average-case performance, they often fail to perform well on certain groups of the data. Such group information may be expensive to obtain; thus, recent works in robustness and fairness have proposed ways to improve worst-group performance even when group labels are unavailable for the training data. However, t… ▽ More

    Submitted 10 April, 2022; v1 submitted 31 December, 2021; originally announced January 2022.

    Comments: 26 pages

  18. arXiv:2110.08133  [pdf, other

    cs.LG cs.CV

    Trade-offs of Local SGD at Scale: An Empirical Study

    Authors: Jose Javier Gonzalez Ortiz, Jonathan Frankle, Mike Rabbat, Ari Morcos, Nicolas Ballas

    Abstract: As datasets and models become increasingly large, distributed training has become a necessary component to allow deep neural networks to train in reasonable amounts of time. However, distributed training can have substantial communication overhead that hinders its scalability. One strategy for reducing this overhead is to perform multiple unsynchronized SGD steps independently on each worker betwe… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  19. arXiv:2106.02719  [pdf, other

    cs.CV

    Hierarchical Video Generation for Complex Data

    Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville

    Abstract: Videos can often be created by first outlining a global description of the scene and then adding local details. Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, that is then refined by subsequent levels in the hierarchy. We train each level in our… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  20. arXiv:2104.13963  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat

    Abstract: This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly… ▽ More

    Submitted 30 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: ICCV 2021

  21. arXiv:2010.02838  [pdf, other

    cs.LG cs.DC math.OC

    A Closer Look at Codistillation for Distributed Training

    Authors: Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

    Abstract: Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain i… ▽ More

    Submitted 25 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Under review

  22. arXiv:2006.12279  [pdf, other

    cs.LG stat.ML

    Revisiting Loss Modelling for Unstructured Pruning

    Authors: César Laurent, Camille Ballas, Thomas George, Nicolas Ballas, Pascal Vincent

    Abstract: By removing parameters from deep neural networks, unstructured pruning methods aim at cutting down memory footprint and computational cost, while maintaining prediction accuracy. In order to tackle this otherwise intractable problem, many of these methods model the loss landscape using first or second order Taylor expansions to identify which parameters can be discarded. We revisit loss modelling… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  23. arXiv:2006.10803  [pdf, other

    cs.LG cs.CV stat.ML

    Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

    Authors: Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

    Abstract: We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training. We propose a semi-supervised loss, SuNCEt, based on noise-contrastive estimation and neighbourhood component analysis, that aims to distinguish examples of different classes in addition to the self-supervised instance-w… ▽ More

    Submitted 1 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  24. arXiv:1910.00643  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

    Authors: Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

    Abstract: Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllRedu… ▽ More

    Submitted 19 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted to ICLR 2020

  25. arXiv:1908.06037  [pdf, other

    cs.CV cs.LG

    Needles in Haystacks: On Classifying Tiny Objects in Large Images

    Authors: Nick Pawlowski, Suvrat Bhooshan, Nicolas Ballas, Francesco Ciompi, Ben Glocker, Michal Drozdzal

    Abstract: In some important computer vision domains, such as medical or hyperspectral imaging, we care about the classification of tiny objects in large images. However, most Convolutional Neural Networks (CNNs) for image classification were developed using biased datasets that contain large objects, in mostly central image positions. To assess whether classical CNN architectures work well for tiny object c… ▽ More

    Submitted 6 January, 2020; v1 submitted 16 August, 2019; originally announced August 2019.

  26. arXiv:1906.04585  [pdf, other

    cs.LG cs.AI cs.MA math.OC stat.ML

    Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

    Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

    Abstract: Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take a… ▽ More

    Submitted 21 April, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems (2019) 13299-13309

  27. arXiv:1904.12165  [pdf, other

    cs.CV cs.LG

    Improved Conditional VRNNs for Video Prediction

    Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville

    Abstract: Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this i… ▽ More

    Submitted 27 April, 2019; originally announced April 2019.

    Comments: Project page: https://sites.google.com/view/videovrnn

  28. arXiv:1811.10792  [pdf, other

    cs.LG cs.AI cs.DC cs.MA math.OC stat.ML

    Stochastic Gradient Push for Distributed Deep Learning

    Authors: Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat

    Abstract: Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only perfor… ▽ More

    Submitted 14 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: ICML 2019

    Journal ref: International Conference on Machine Learning 97 (2019) 344-353

  29. arXiv:1807.05031  [pdf, other

    stat.ML cs.LG

    On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

    Authors: Stanisław Jastrzębski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly… ▽ More

    Submitted 23 December, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

    Journal ref: International Conference on Learning Representations (ICLR) 2019

  30. arXiv:1806.07937  [pdf, other

    cs.LG cs.AI stat.ML

    A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning

    Authors: Amy Zhang, Nicolas Ballas, Joelle Pineau

    Abstract: The risks and perils of overfitting in machine learning are well known. However most of the treatment of this, including diagnostic tools and remedies, was developed for the supervised learning case. In this work, we aim to offer new perspectives on the characterization and prevention of overfitting in deep Reinforcement Learning (RL) methods, with a particular focus on continuous domains. We exam… ▽ More

    Submitted 25 June, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: 20 pages, 16 figures

  31. arXiv:1806.03884  [pdf, other

    cs.LG stat.ML

    Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

    Authors: Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent

    Abstract: Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approxima… ▽ More

    Submitted 26 July, 2021; v1 submitted 11 June, 2018; originally announced June 2018.

    Journal ref: Advances in Neural Information Processing Systems 2018

  32. arXiv:1711.04623  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Three Factors Influencing Minima in SGD

    Authors: Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic… ▽ More

    Submitted 13 September, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: First two authors contributed equally. Short version accepted into ICLR workshop. Accepted to Artificial Neural Networks and Machine Learning, ICANN 2018

  33. arXiv:1710.04773  [pdf, other

    cs.CV

    Residual Connections Encourage Iterative Inference

    Authors: Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

    Abstract: Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of ite… ▽ More

    Submitted 8 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: First two authors contributed equally. Published in ICLR 2018

  34. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  35. arXiv:1611.07810  [pdf, other

    cs.CV

    A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

    Authors: Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, Christopher Pal

    Abstract: While deep convolutional neural networks frequently approach or exceed human-level performance at benchmark tasks involving static images, extending this success to moving images is not straightforward. Having models which can learn to understand video is of interest for many applications, including content recommendation, prediction, summarization, event/object detection and understanding human v… ▽ More

    Submitted 5 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

  36. arXiv:1606.01305  [pdf, other

    cs.NE cs.CL cs.LG

    Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

    Authors: David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

    Abstract: We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of drop** hidden units, gradient information and state information are more readily propagated through time, as in feed… ▽ More

    Submitted 22 September, 2017; v1 submitted 3 June, 2016; originally announced June 2016.

    Comments: David Krueger and Tegan Maharaj contributed equally to this work

  37. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  38. arXiv:1603.09025  [pdf, other

    cs.LG

    Recurrent Batch Normalization

    Authors: Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville

    Abstract: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate… ▽ More

    Submitted 27 February, 2017; v1 submitted 29 March, 2016; originally announced March 2016.

  39. arXiv:1511.07838  [pdf, other

    cs.LG cs.NE

    Dynamic Capacity Networks

    Authors: Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville

    Abstract: We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the input data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which… ▽ More

    Submitted 22 May, 2016; v1 submitted 24 November, 2015; originally announced November 2015.

    Comments: ICML 2016

  40. arXiv:1511.06432  [pdf, other

    cs.CV cs.LG cs.NE

    Delving Deeper into Convolutional Networks for Learning Video Representations

    Authors: Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville

    Abstract: We propose an approach to learn spatio-temporal features in videos from intermediate visual representations we call "percepts" using Gated-Recurrent-Unit Recurrent Networks (GRUs).Our method relies on percepts that are extracted from all level of a deep convolutional network trained on the large ImageNet dataset. While high-level percepts contain highly discriminative information, they tend to hav… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR 2016

  41. arXiv:1511.04590  [pdf, other

    cs.CV cs.CL stat.ML

    Oracle performance for visual captioning

    Authors: Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

    Abstract: The task of associating images and videos with a natural language description has attracted a great amount of attention recently. Rapid progress has been made in terms of both develo** novel algorithms and releasing new datasets. Indeed, the state-of-the-art results on some of the standard datasets have been pushed into the regime where it has become more and more difficult to make significant i… ▽ More

    Submitted 14 September, 2016; v1 submitted 14 November, 2015; originally announced November 2015.

    Comments: BMVC2016 (Oral paper)

  42. arXiv:1502.08029  [pdf, other

    stat.ML cs.AI cs.CL cs.CV cs.LG

    Describing Videos by Exploiting Temporal Structure

    Authors: Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

    Abstract: Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully… ▽ More

    Submitted 30 September, 2015; v1 submitted 27 February, 2015; originally announced February 2015.

    Comments: Accepted to ICCV15. This version comes with code release and supplementary material

  43. arXiv:1412.6550  [pdf, ps, other

    cs.LG cs.NE

    FitNets: Hints for Thin Deep Nets

    Authors: Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio

    Abstract: While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper… ▽ More

    Submitted 27 March, 2015; v1 submitted 19 December, 2014; originally announced December 2014.