Skip to main content

Showing 1–12 of 12 results for author: Oquab, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09294  [pdf, other

    cs.LG cs.CV

    You Don't Need Data-Augmentation in Self-Supervised Learning

    Authors: Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski

    Abstract: Self-Supervised learning (SSL) with Joint-Embedding Architectures (JEA) has led to outstanding performances. All instantiations of this paradigm were trained using strong and well-established hand-crafted data augmentations, leading to the general belief that they are required for the proper training and performance of such models. On the other hand, generative reconstruction-based models such as… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2405.15613  [pdf, other

    cs.LG cs.AI cs.CV

    Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

    Authors: Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

    Abstract: Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the datas… ▽ More

    Submitted 28 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.01469  [pdf, other

    cs.CV cs.AI

    Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning

    Authors: Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou

    Abstract: AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiolog… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2309.16588  [pdf, other

    cs.CV

    Vision Transformers Need Registers

    Authors: Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski

    Abstract: Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose… ▽ More

    Submitted 12 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  5. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  6. arXiv:2212.04884  [pdf, other

    cs.CV

    Co-training $2^L$ Submodels for Visual Recognition

    Authors: Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou

    Abstract: We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the reg… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  7. arXiv:2203.08765  [pdf, other

    cs.CV

    Efficient conditioned face animation using frontally-viewed embedding

    Authors: Maxime Oquab, Daniel Haziza, Ludovic Schwartz, Tao Xu, Katayoun Zand, Rui Wang, Peirong Liu, Camille Couprie

    Abstract: As the quality of few shot facial animation from landmarks increases, new applications become possible, such as ultra low bandwidth video chat compression with a high degree of realism. However, there are some important challenges to tackle in order to improve the experience in real world conditions. In particular, the current approaches fail to represent profile views without distortions, while r… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

  8. arXiv:2101.02258  [pdf, other

    cs.CL

    Can RNNs learn Recursive Nested Subject-Verb Agreements?

    Authors: Yair Lakretz, Théo Desbordes, Jean-Rémi King, Benoît Crabbé, Maxime Oquab, Stanislas Dehaene

    Abstract: One of the fundamental principles of contemporary linguistics states that language processing requires the ability to extract recursively nested tree structures. However, it remains unclear whether and how this code could be implemented in neural circuits. Recent advances in Recurrent Neural Networks (RNNs), which achieve near-human performance in some language tasks, provide a compelling model to… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  9. arXiv:2012.00328  [pdf, other

    cs.CV cs.LG

    Low Bandwidth Video-Chat Compression using Deep Generative Models

    Authors: Maxime Oquab, Pierre Stock, Oran Gafni, Daniel Haziza, Tao Xu, Peizhao Zhang, Onur Celebi, Yana Hasson, Patrick Labatut, Bobo Bose-Kolanu, Thibault Peyronel, Camille Couprie

    Abstract: To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network. In this context, we discuss and evaluate the benefits and disadvantages of several deep adversarial approaches. In particular,… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 11 pages

  10. arXiv:1902.08401  [pdf, other

    cs.LG stat.ML

    Learning about an exponential amount of conditional distributions

    Authors: Mohamed Ishmael Belghazi, Maxime Oquab, Yann LeCun, David Lopez-Paz

    Abstract: We introduce the Neural Conditioner (NC), a self-supervised machine able to learn about all the conditional distributions of a random vector $X$. The NC is a function $NC(x \cdot a, a, r)$ that leverages adversarial training to match each conditional distribution $P(X_r|X_a=x_a)$. After training, the NC generalizes to sample from conditional distributions never seen, including the joint distributi… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: 8 pages, 7 figures

  11. arXiv:1712.07822  [pdf, other

    stat.ML cs.AI cs.LG

    Geometrical Insights for Implicit Generative Modeling

    Authors: Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

    Abstract: Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences… ▽ More

    Submitted 21 August, 2019; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: this version fixes a typo in a definition

  12. arXiv:1609.04331  [pdf, other

    cs.CV

    ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization

    Authors: Vadim Kantorov, Maxime Oquab, Minsu Cho, Ivan Laptev

    Abstract: We aim to localize objects in images using image-level supervision only. Previous approaches to this problem mainly focus on discriminative object regions and often fail to locate precise object boundaries. We address this problem by introducing two types of context-aware guidance models, additive and contrastive models, that leverage their surrounding context regions to improve localization. The… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.

    Comments: Accepted paper at ECCV2016. The website and code is at http://www.di.ens.fr/willow/research/contextlocnet