Skip to main content

Showing 1–15 of 15 results for author: Cabon, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09756  [pdf, other

    cs.CV

    Grounding Image Matching in 3D with MASt3R

    Authors: Vincent Leroy, Yohann Cabon, Jérôme Revaud

    Abstract: Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. I… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2312.14132  [pdf, other

    cs.CV

    DUSt3R: Geometric 3D Vision Made Easy

    Authors: Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, Jerome Revaud

    Abstract: Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  3. arXiv:2310.01897  [pdf, other

    cs.CV

    MFOS: Model-Free & One-Shot Object Pose Estimation

    Authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud

    Abstract: Existing learning-based methods for object pose estimation in RGB images are mostly model-specific or category based. They lack the capability to generalize to new object categories at test time, hence severely hindering their practicability and scalability. Notably, recent attempts have been made to solve this issue, but they still require accurate 3D data of the object surface at both train and… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2307.11702  [pdf, other

    cs.CV

    SACReg: Scene-Agnostic Coordinate Regression for Visual Localization

    Authors: Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel

    Abstract: Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain limited to small scenes memorized during training, and thus hardly scale to realistic datasets and scenarios. In this paper, we propose a generalized SCR model trained once to be deployed in new test scenes, regardless of the… ▽ More

    Submitted 30 November, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

  5. arXiv:2211.10408  [pdf, other

    cs.CV

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Authors: Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, Jérôme Revaud

    Abstract: Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v… ▽ More

    Submitted 18 August, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: ICCV 2023

  6. arXiv:2210.10716  [pdf, other

    cs.CV

    CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

    Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud

    Abstract: Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d… ▽ More

    Submitted 12 January, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  7. Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark

    Authors: Martin Humenberger, Yohann Cabon, Noé Pion, Philippe Weinzaepfel, Donghwan Lee, Nicolas Guérin, Torsten Sattler, Gabriela Csurka

    Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is co… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: International Journal of Computer Vision (2022). arXiv admin note: text overlap with arXiv:2011.11946

  8. arXiv:2105.08941  [pdf, other

    cs.CV

    Large-scale Localization Datasets in Crowded Indoor Spaces

    Authors: Donghwan Lee, Soohyun Ryu, Suyong Yeon, Yonghan Lee, Deokhwa Kim, Cheolho Han, Yohann Cabon, Philippe Weinzaepfel, Nicolas Guérin, Gabriela Csurka, Martin Humenberger

    Abstract: Estimating the precise location of a camera using visual localization enables interesting applications such as augmented reality or robot navigation. This is particularly useful in indoor environments where other localization technologies, such as GNSS, fail. Indoor spaces impose interesting challenges on visual localization algorithms: occlusions due to people, textureless surfaces, large viewpoi… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

  9. arXiv:2011.11946  [pdf, other

    cs.CV cs.LG

    Benchmarking Image Retrieval for Visual Localization

    Authors: Noé Pion, Martin Humenberger, Gabriela Csurka, Yohann Cabon, Torsten Sattler

    Abstract: Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is commo… ▽ More

    Submitted 1 December, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: International Conference on 3D Vision, 2020

  10. arXiv:2007.13867  [pdf, other

    cs.CV cs.LG

    Robust Image Retrieval-based Visual Localization using Kapture

    Authors: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, Jérôme Revaud, Philippe Rerole, Noé Pion, Cesar de Souza, Gabriela Csurka

    Abstract: Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul… ▽ More

    Submitted 7 January, 2022; v1 submitted 27 July, 2020; originally announced July 2020.

  11. arXiv:2001.10773  [pdf, other

    cs.CV cs.RO eess.IV

    Virtual KITTI 2

    Authors: Yohann Cabon, Naila Murray, Martin Humenberger

    Abstract: This paper introduces an updated version of the well-known Virtual KITTI dataset which consists of 5 sequence clones from the KITTI tracking benchmark. In addition, the dataset provides different variants of these sequences such as modified weather conditions (e.g. fog, rain) or modified camera configurations (e.g. rotated by 15 degrees). For each sequence, we provide multiple sets of images conta… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

  12. arXiv:1910.06699  [pdf, other

    cs.CV cs.LG cs.MM

    Generating Human Action Videos by Coupling 3D Game Engines and Probabilistic Graphical Models

    Authors: César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Naila Murray, Antonio Manuel López

    Abstract: Deep video action recognition models have been highly successful in recent years but require large quantities of manually annotated data, which are expensive and laborious to obtain. In this work, we investigate the generation of synthetic training data for video action recognition, as synthetic data have been successfully used to supervise models for a variety of other computer vision tasks. We p… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

    Comments: Pre-print of the article accepted for publication in the Special Issue on Generating Realistic Visual Data of Human Behavior of the International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:1612.00881

  13. arXiv:1906.06195  [pdf, other

    cs.CV

    R2D2: Repeatable and Reliable Detector and Descriptor

    Authors: Jerome Revaud, Philippe Weinzaepfel, César De Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, Martin Humenberger

    Abstract: Interest point detection and local feature description are fundamental steps in many computer vision applications. Classical methods for these tasks are based on a detect-then-describe paradigm where separate handcrafted methods are used to first identify repeatable keypoints and then represent them with a local descriptor. Neural networks trained with metric learning losses have recently caught u… ▽ More

    Submitted 17 June, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

  14. arXiv:1612.00881  [pdf, other

    cs.CV

    Procedural Generation of Videos to Train Deep Action Recognition Networks

    Authors: César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña

    Abstract: Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametri… ▽ More

    Submitted 19 July, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

    Comments: Accepted for publication at CVPR 2017. http://adas.cvc.uab.es/phav/

  15. arXiv:1605.06457  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Virtual Worlds as Proxy for Multi-Object Tracking Analysis

    Authors: Adrien Gaidon, Qiao Wang, Yohann Cabon, Eleonora Vig

    Abstract: Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dat… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

    Comments: CVPR 2016, Virtual KITTI dataset download at http://www.xrce.xerox.com/Research-Development/Computer-Vision/Proxy-Virtual-Worlds