Skip to main content

Showing 1–29 of 29 results for author: Caron, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.02041  [pdf, other

    cs.CV

    A Generative Approach for Wikipedia-Scale Visual Entity Recognition

    Authors: Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

    Abstract: In this paper, we address web-scale visual entity recognition, specifically the task of map** a given query image to one of the 6 million existing entities in Wikipedia. One way of approaching a problem of such scale is using dual-encoder models (eg CLIP), where all the entity names and query images are embedded into a unified space, paving the way for an approximate k-NN search. Alternatively,… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  2. arXiv:2312.08825  [pdf, other

    cs.CV

    Guided Diffusion from Self-Supervised Diffusion Features

    Authors: Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer

    Abstract: Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or classifier pretraining. That is why guidance was harnessed from self-supervised learning backbones, like DINO. However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks a… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Work In Progress

  3. arXiv:2310.17209  [pdf, other

    cs.CV cs.LG

    Weakly-Supervised Surgical Phase Recognition

    Authors: Roy Hirsch, Regev Cohen, Mathilde Caron, Tomer Golany, Daniel Freedman, Ehud Rivlin

    Abstract: A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize withi… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  4. Self-Supervised Learning for Endoscopic Video Analysis

    Authors: Roy Hirsch, Mathilde Caron, Regev Cohen, Amir Livne, Ron Shapiro, Tomer Golany, Roman Goldenberg, Daniel Freedman, Ehud Rivlin

    Abstract: Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally inva… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to MICCAI 2023

    Journal ref: MICCAI 2023

  5. arXiv:2307.06304  [pdf, other

    cs.CV cs.AI cs.LG

    Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

    Authors: Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby

    Abstract: The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  6. arXiv:2306.07196  [pdf, other

    cs.CV

    Retrieval-Enhanced Contrastive Vision-Text Models

    Authors: Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

    Abstract: Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems. While they excel at recognizing common generic concepts, they still struggle on fine-grained entities which are rare, or even absent from the pre-training dataset. Hence, a key ingredient to their success has been the use of large-scale curated pre-training data aiming at expanding the set of conc… ▽ More

    Submitted 21 February, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

  7. arXiv:2304.06708  [pdf, other

    cs.CV cs.AI cs.CL

    Verbs in Action: Improving verb understanding in video-language models

    Authors: Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid

    Abstract: Understanding verbs is crucial to modelling how people and objects interact with each other and the environment through space and time. Recently, state-of-the-art video-language models based on CLIP have been shown to have limited verb understanding and to rely extensively on nouns, restricting their performance in real-world video applications that require action and temporal understanding. In th… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  8. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  9. arXiv:2212.08013  [pdf, other

    cs.CV cs.AI cs.LG

    FlexiViT: One Model for All Patch Sizes

    Authors: Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

    Abstract: Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of w… ▽ More

    Submitted 23 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Code and pre-trained models available at https://github.com/google-research/big_vision. All authors made significant technical contributions. CVPR 2023

  10. arXiv:2212.02400  [pdf, other

    cs.CV

    Location-Aware Self-Supervised Transformers for Semantic Segmentation

    Authors: Mathilde Caron, Neil Houlsby, Cordelia Schmid

    Abstract: Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which m… ▽ More

    Submitted 15 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  11. arXiv:2210.07198  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning

    Authors: Arsene Fansi Tchango, Rishab Goel, Julien Martel, Zhi Wen, Gaetan Marceau Caron, Joumana Ghosn

    Abstract: The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure docto… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Camera ready. NeurIPS 2022

  12. arXiv:2210.04485  [pdf, other

    cs.CV cs.LG

    A Memory Transformer Network for Incremental Learning

    Authors: Ahmet Iscen, Thomas Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid

    Abstract: We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes. One of the most successful existing methods has been the use of a memo… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  13. arXiv:2204.07141  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Masked Siamese Networks for Label-Efficient Learning

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas

    Abstract: We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  14. arXiv:2202.08360  [pdf, other

    cs.CV cs.AI cs.CY

    Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

    Authors: Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

    Abstract: Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any… ▽ More

    Submitted 22 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  15. arXiv:2112.09118  [pdf, other

    cs.IR cs.AI cs.CL

    Unsupervised Dense Information Retrieval with Contrastive Learning

    Authors: Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

    Abstract: Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  16. arXiv:2106.09681  [pdf, other

    cs.CV cs.LG

    XCiT: Cross-Covariance Image Transformers

    Authors: Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

    Abstract: Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic comple… ▽ More

    Submitted 18 June, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  17. arXiv:2105.03404  [pdf, other

    cs.CV

    ResMLP: Feedforward networks for image classification with data-efficient training

    Authors: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou

    Abstract: We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using hea… ▽ More

    Submitted 10 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  18. arXiv:2104.14294  [pdf, other

    cs.CV

    Emerging Properties in Self-Supervised Vision Transformers

    Authors: Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

    Abstract: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentatio… ▽ More

    Submitted 24 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: 21 pages

  19. arXiv:2104.13963  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat

    Abstract: This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly… ▽ More

    Submitted 30 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: ICCV 2021

  20. A Practical Survey on Faster and Lighter Transformers

    Authors: Quentin Fournier, Gaétan Marceau Caron, Daniel Aloise

    Abstract: Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model solely based on the attention mechanism that is able to relate any two positions of the input sequence, hence modelling arbitrary long dependencies. The Transforme… ▽ More

    Submitted 27 March, 2023; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: ACM Computing Surveys; 40 pages, 18 figures, 4 tables

  21. arXiv:2103.01988  [pdf, other

    cs.CV cs.AI

    Self-supervised Pretraining of Visual Features in the Wild

    Authors: Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski

    Abstract: Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives… ▽ More

    Submitted 5 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  22. arXiv:2010.16004  [pdf, other

    cs.CY cs.LG cs.MA cs.SI

    COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

    Authors: Prateek Gupta, Tegan Maharaj, Martin Weiss, Nasim Rahaman, Hannah Alsdurf, Abhinav Sharma, Nanor Minoyan, Soren Harnois-Leblanc, Victor Schmidt, Pierre-Luc St. Charles, Tristan Deleu, Andrew Williams, Akshay Patel, Meng Qu, Olexa Bilaniuk, Gaétan Marceau Caron, Pierre Luc Carrier, Satya Ortiz-Gagné, Marc-Andre Rousseau, David Buckeridge, Joumana Ghosn, Yang Zhang, Bernhard Schölkopf, Jian Tang, Irina Rish , et al. (4 additional authors not shown)

    Abstract: The rapid global spread of COVID-19 has led to an unprecedented demand for effective methods to mitigate the spread of the disease, and various digital contact tracing (DCT) methods have emerged as a component of the solution. In order to make informed public health choices, there is a need for tools which allow evaluation and comparison of DCT methods. We introduce an agent-based compartmental si… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  23. arXiv:2010.12536  [pdf, other

    cs.LG cs.AI cs.MA cs.SI

    Predicting Infectiousness for Proactive Contact Tracing

    Authors: Yoshua Bengio, Prateek Gupta, Tegan Maharaj, Nasim Rahaman, Martin Weiss, Tristan Deleu, Eilif Muller, Meng Qu, Victor Schmidt, Pierre-Luc St-Charles, Hannah Alsdurf, Olexa Bilanuik, David Buckeridge, Gáetan Marceau Caron, Pierre-Luc Carrier, Joumana Ghosn, Satya Ortiz-Gagne, Chris Pal, Irina Rish, Bernhard Schölkopf, Abhinav Sharma, Jian Tang, Andrew Williams

    Abstract: The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between pri… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  24. arXiv:2006.09882  [pdf, other

    cs.CV

    Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

    Authors: Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin

    Abstract: Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of… ▽ More

    Submitted 8 January, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  25. arXiv:2006.08140  [pdf

    cs.CY cs.AI cs.LG

    The Social Contract for AI

    Authors: Mirka Snyder Caron, Abhishek Gupta

    Abstract: Like any technology, AI systems come with inherent risks and potential benefits. It comes with potential disruption of established norms and methods of work, societal impacts and externalities. One may think of the adoption of technology as a form of social contract, which may evolve or fluctuate in time, scale, and impact. It is important to keep in mind that for AI, meeting the expectations of t… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted paper for presentation at the IJCAI 2019 AI for Social Good workshop

  26. arXiv:2006.07025  [pdf

    cs.CY

    Response to Office of the Privacy Commissioner of Canada Consultation Proposals pertaining to amendments to PIPEDA relative to Artificial Intelligence

    Authors: Mirka Snyder Caron, Abhishek Gupta

    Abstract: In February 2020, the Montreal AI Ethics Institute (MAIEI) was invited by the Office of the Privacy Commissioner of Canada (OPCC) to provide for comments both at a closed roundtable and in writing on the OPCC consultation proposal for amendments relative to Artificial Intelligence (AI), to the Canadian privacy legislation, the Personal Information Protection and Electronic Documents Act (PIPEDA).… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Submitted to the Office of the Privacy Commissioner of Canada

  27. arXiv:2001.03554  [pdf, other

    cs.CV cs.LG cs.NE

    Pruning Convolutional Neural Networks with Self-Supervision

    Authors: Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin

    Abstract: Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  28. arXiv:1905.01278  [pdf, other

    cs.CV

    Unsupervised Pre-Training of Image Features on Non-Curated Data

    Authors: Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

    Abstract: Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to… ▽ More

    Submitted 13 August, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: Accepted at ICCV 2019 (Oral)

  29. arXiv:1807.05520  [pdf, other

    cs.CV

    Deep Clustering for Unsupervised Learning of Visual Features

    Authors: Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze

    Abstract: Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. De… ▽ More

    Submitted 18 March, 2019; v1 submitted 15 July, 2018; originally announced July 2018.

    Comments: Accepted at ECCV 2018