Skip to main content

Showing 1–17 of 17 results for author: Revaud, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09756  [pdf, other

    cs.CV

    Grounding Image Matching in 3D with MASt3R

    Authors: Vincent Leroy, Yohann Cabon, Jérôme Revaud

    Abstract: Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. I… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2312.14132  [pdf, other

    cs.CV

    DUSt3R: Geometric 3D Vision Made Easy

    Authors: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

    Abstract: Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  3. arXiv:2310.01897  [pdf, other

    cs.CV

    MFOS: Model-Free & One-Shot Object Pose Estimation

    Authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud

    Abstract: Existing learning-based methods for object pose estimation in RGB images are mostly model-specific or category based. They lack the capability to generalize to new object categories at test time, hence severely hindering their practicability and scalability. Notably, recent attempts have been made to solve this issue, but they still require accurate 3D data of the object surface at both train and… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2310.00632  [pdf, other

    cs.CV

    Win-Win: Training High-Resolution Vision Transformers from Two Windows

    Authors: Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel

    Abstract: Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter so… ▽ More

    Submitted 22 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  5. arXiv:2307.11702  [pdf, other

    cs.CV

    SACReg: Scene-Agnostic Coordinate Regression for Visual Localization

    Authors: Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel

    Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain limited to small scenes memorized during training, and thus hardly scale to realistic datasets and scenarios. In this paper, we propose a generalized SCR model trained once to be deployed in new test scenes, regardless of the… ▽ More

    Submitted 30 November, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

  6. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  7. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  8. arXiv:2108.11096  [pdf, other

    cs.CV cs.LG

    Learning From Long-Tailed Data With Noisy Labels

    Authors: Shyamgopal Karthik, Jérome Revaud, Boris Chidlovskii

    Abstract: Class imbalance and noisy labels are the norm rather than the exception in many large-scale classification datasets. Nevertheless, most works in machine learning typically assume balanced and clean data. There have been some recent attempts to tackle, on one side, the problem of learning from noisy labels and, on the other side, learning from long-tailed data. Each group of methods make simplifyin… ▽ More

    Submitted 12 September, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

  9. arXiv:2007.13867  [pdf, other

    cs.CV cs.LG

    Robust Image Retrieval-based Visual Localization using Kapture

    Authors: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, Jérôme Revaud, Philippe Rerole, Noé Pion, Cesar de Souza, Gabriela Csurka

    Abstract: Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul… ▽ More

    Submitted 7 January, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

  10. arXiv:1906.07589  [pdf, other

    cs.CV

    Learning with Average Precision: Training Image Retrieval with a Listwise Loss

    Authors: Jerome Revaud, Jon Almazan, Rafael Sampaio de Rezende, Cesar Roberto de Souza

    Abstract: Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  11. arXiv:1906.06195  [pdf, other

    cs.CV

    R2D2: Repeatable and Reliable Detector and Descriptor

    Authors: Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger

    Abstract: Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught u… ▽ More

    Submitted 17 June, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

  12. arXiv:1610.07940  [pdf, other

    cs.CV

    End-to-end Learning of Deep Visual Representations for Image Retrieval

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: i) noisy training data, ii) inappropriate deep architecture, and iii) suboptimal trai… ▽ More

    Submitted 5 May, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Accepted for publication at the International Journal of Computer Vision (IJCV). Extended version of our ECCV2016 paper "Deep Image Retrieval: Learning global representations for image search"

  13. arXiv:1604.01325  [pdf, other

    cs.CV

    Deep Image Retrieval: Learning global representations for image search

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the specific task of image retrieval. Our contribution is tw… ▽ More

    Submitted 28 July, 2016; v1 submitted 5 April, 2016; originally announced April 2016.

    Comments: ECCV 2016 version + additional results

  14. arXiv:1508.03755  [pdf, other

    cs.CV

    Beat-Event Detection in Action Movie Franchises

    Authors: Danila Potapov, Matthijs Douze, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid

    Abstract: While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

  15. arXiv:1506.07656  [pdf, other

    cs.CV

    DeepMatching: Hierarchical Deformable Dense Matching

    Authors: Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

    Abstract: We introduce a novel matching algorithm, called DeepMatching, to compute dense correspondences between images. DeepMatching relies on a hierarchical, multi-layer, correlational architecture designed for matching images and was inspired by deep convolutional approaches. The proposed matching algorithm can handle non-rigid deformations and repetitive textures and efficiently determines dense corresp… ▽ More

    Submitted 8 October, 2015; v1 submitted 25 June, 2015; originally announced June 2015.

  16. arXiv:1506.02588  [pdf, other

    cs.CV

    Circulant temporal encoding for video retrieval and temporal alignment

    Authors: Matthijs Douze, Jérôme Revaud, Jakob Verbeek, Hervé Jégou, Cordelia Schmid

    Abstract: We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently co… ▽ More

    Submitted 30 November, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

  17. arXiv:1501.02565  [pdf, other

    cs.CV

    EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

    Authors: Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid

    Abstract: We propose a novel approach for optical flow estimation , targeted at large displacements with significant oc-clusions. It consists of two steps: i) dense matching by edge-preserving interpolation from a sparse set of matches; ii) variational energy minimization initialized with the dense matches. The sparse-to-dense interpolation relies on an appropriate choice of the distance, namely an edge-awa… ▽ More

    Submitted 19 May, 2015; v1 submitted 12 January, 2015; originally announced January 2015.