Skip to main content

Showing 1–19 of 19 results for author: Assran, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09415  [pdf, other

    cs.CV cs.LG

    An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

    Authors: Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen

    Abstract: This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by directly treating each individual pixel as a token and achieve highly performant results. This is substantially different from the popular design in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Technical report, 23 pages

  2. arXiv:2405.00740  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Modeling Caption Diversity in Contrastive Vision-Language Pretraining

    Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

    Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v… ▽ More

    Submitted 14 May, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

  3. arXiv:2404.08471  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Feature Prediction for Learning Visual Representations from Video

    Authors: Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, Nicolas Ballas

    Abstract: This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datase… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  4. arXiv:2403.00504  [pdf, other

    cs.CV cs.AI cs.LG

    Learning and Leveraging World Models in Visual Representation Learning

    Authors: Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun

    Abstract: Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict t… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 23 pages, 16 figures

  5. arXiv:2308.00566  [pdf, other

    cs.CV cs.AI cs.LG

    Stochastic positional embeddings improve masked image modeling

    Authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun

    Abstract: Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determi… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: Code and models available in https://github.com/amirbar/StoP

  6. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  7. arXiv:2302.14483  [pdf, other

    cs.LG cs.CV stat.ML

    RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data

    Authors: Sangwoo Mo, Jong-Chyi Su, Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell

    Abstract: Semi-supervised learning aims to train a model using limited labels. State-of-the-art semi-supervised methods for image classification such as PAWS rely on self-supervised representations learned with large-scale unlabeled but curated data. However, PAWS is often less effective when using real-world unlabeled data that is uncurated, e.g., contains out-of-class data. We propose RoPAWS, a robust ext… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: ICLR 2023

  8. arXiv:2301.08243  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

    Authors: Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas

    Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target block… ▽ More

    Submitted 13 April, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 2023 IEEE/CVF International Conference on Computer Vision

  9. arXiv:2210.07277  [pdf, other

    cs.LG cs.AI cs.CV

    The Hidden Uniform Cluster Prior in Self-Supervised Learning

    Authors: Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

    Abstract: A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-bal… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  10. arXiv:2204.07141  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Masked Siamese Networks for Label-Efficient Learning

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas

    Abstract: We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  11. arXiv:2106.10708  [pdf, other

    cs.LG math.OC

    Memory Augmented Optimizers for Deep Learning

    Authors: Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath Chandar

    Abstract: Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or expl… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 24 Pages. Currently under review

  12. arXiv:2104.13963  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat

    Abstract: This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly… ▽ More

    Submitted 30 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: ICCV 2021

  13. arXiv:2010.02838  [pdf, other

    cs.LG cs.DC math.OC

    A Closer Look at Codistillation for Distributed Training

    Authors: Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

    Abstract: Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain i… ▽ More

    Submitted 25 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Under review

  14. arXiv:2006.13838  [pdf, other

    cs.LG math.OC stat.ML

    Advances in Asynchronous Parallel and Distributed Optimization

    Authors: Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

    Abstract: Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of compu… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: 33 pages, 4 figures

  15. arXiv:2006.10803  [pdf, other

    cs.LG cs.CV stat.ML

    Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

    Authors: Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

    Abstract: We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training. We propose a semi-supervised loss, SuNCEt, based on noise-contrastive estimation and neighbourhood component analysis, that aims to distinguish examples of different classes in addition to the self-supervised instance-w… ▽ More

    Submitted 1 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  16. arXiv:2002.12414  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

    Authors: Mahmoud Assran, Michael Rabbat

    Abstract: We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov's method in stochastic settings, we focus throughout on objectives that are smooth, stro… ▽ More

    Submitted 27 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Machine Learning (ICML 2020)

  17. arXiv:1906.04585  [pdf, other

    cs.LG cs.AI cs.MA math.OC stat.ML

    Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

    Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

    Abstract: Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take a… ▽ More

    Submitted 21 April, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems (2019) 13299-13309

  18. arXiv:1811.10792  [pdf, other

    cs.LG cs.AI cs.DC cs.MA math.OC stat.ML

    Stochastic Gradient Push for Distributed Deep Learning

    Authors: Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat

    Abstract: Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only perfor… ▽ More

    Submitted 14 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: ICML 2019

    Journal ref: International Conference on Machine Learning 97 (2019) 344-353

  19. arXiv:1803.08950  [pdf, other

    cs.MA eess.SY math.OC

    Asynchronous Gradient-Push

    Authors: Mahmoud Assran, Michael Rabbat

    Abstract: We consider a multi-agent framework for distributed optimization where each agent has access to a local smooth strongly convex function, and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents' local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents. When the local functions are strongly… ▽ More

    Submitted 2 March, 2020; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 33 pages, 9 figures, accepted to IEEE Transactions on Automatic Control

    Journal ref: IEEE Transactions on Automatic Control (2020)