Search | arXiv e-print repository

arXiv:2103.16525 [pdf, other]

Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos using Depth Networks and Photometric Constraints

Authors: David Recasens, José Lamarca, José M. Fácil, J. M. M. Montiel, Javier Civera

Abstract: Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors, e.g. the deformation of in-body cavities or the lack of texture. In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos. Our approach leverages recent advances in self-su… ▽ More Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors, e.g. the deformation of in-body cavities or the lack of texture. In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos. Our approach leverages recent advances in self-supervised depth networks to generate pseudo-RGBD frames, then tracks the camera pose using photometric residuals and fuses the registered depth maps in a volumetric representation. We present an extensive experimental evaluation in the public dataset Hamlyn, showing high-quality results and comparisons against relevant baselines. We also release all models and code for future comparisons. △ Less

Submitted 3 July, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

arXiv:1904.02028 [pdf, other]

CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth

Authors: Jose M. Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, Javier Civera

Abstract: Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn cali… ▽ More Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns. Experiments confirm that this improves the generalization capabilities of depth prediction networks considerably, and clearly outperforms the state of the art when the train and test images are acquired with different cameras. △ Less

Submitted 3 April, 2019; originally announced April 2019.

Comments: Camera ready version for CVPR 2019. Project page: http://webdiis.unizar.es/~jmfacil/camconvs/

arXiv:1903.08094 [pdf, other]

Corners for Layout: End-to-End Layout Recovery from 360 Images

Authors: Clara Fernandez-Labrador, Jose M. Facil, Alejandro Perez-Yus, Cédric Demonceaux, Javier Civera, Jose J. Guerrero

Abstract: The problem of 3D layout recovery in indoor scenes has been a core research topic for over a decade. However, there are still several major challenges that remain unsolved. Among the most relevant ones, a major part of the state-of-the-art methods make implicit or explicit assumptions on the scenes -- e.g. box-shaped or Manhattan layouts. Also, current methods are computationally expensive and not… ▽ More The problem of 3D layout recovery in indoor scenes has been a core research topic for over a decade. However, there are still several major challenges that remain unsolved. Among the most relevant ones, a major part of the state-of-the-art methods make implicit or explicit assumptions on the scenes -- e.g. box-shaped or Manhattan layouts. Also, current methods are computationally expensive and not suitable for real-time applications like robot navigation and AR/VR. In this work we present CFL (Corners for Layout), the first end-to-end model for 3D layout recovery on 360 images. Our experimental results show that we outperform the state of the art relaxing assumptions about the scene and at a lower cost. We also show that our model generalizes better to camera position variations than conventional approaches by using EquiConvs, a type of convolution applied directly on the sphere projection and hence invariant to the equirectangular distortions. CFL Webpage: https://cfernandezlab.github.io/CFL/ △ Less

Submitted 25 March, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1902.09516 [pdf, other]

Condition-Invariant Multi-View Place Recognition

Authors: Jose M. Facil, Daniel Olid, Luis Montesano, Javier Civera

Abstract: Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grou**, F… ▽ More Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grou**, Fusion and Recurrent Descriptors) for deep networks to use several frames of a sequence. We show that our approaches produce more compact and best performing descriptors than single- and multi-view baselines in the literature in two public databases. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Project website: http://webdiis.unizar.es/~jmfacil/cimvpr/ In submission

arXiv:1808.09879 [pdf, other]

PanoRoom: From the Sphere to the 3D Layout

Authors: Clara Fernandez-Labrador, Jose M. Facil, Alejandro Perez-Yus, Cedric Demonceaux, Jose J. Guerrero

Abstract: We propose a novel FCN able to work with omnidirectional images that outputs accurate probability maps representing the main structure of indoor scenes, which is able to generalize on different data. Our approach handles occlusions and recovers complex shaped rooms more faithful to the actual shape of the real scenes. We outperform the state of the art not only in accuracy of the 3D models but als… ▽ More We propose a novel FCN able to work with omnidirectional images that outputs accurate probability maps representing the main structure of indoor scenes, which is able to generalize on different data. Our approach handles occlusions and recovers complex shaped rooms more faithful to the actual shape of the real scenes. We outperform the state of the art not only in accuracy of the 3D models but also in speed. △ Less

Submitted 29 August, 2018; originally announced August 2018.

arXiv:1808.06516 [pdf, other]

Single-View Place Recognition under Seasonal Changes

Authors: Daniel Olid, José M. Fácil, Javier Civera

Abstract: Single-view place recognition, that we can define as finding an image that corresponds to the same place as a given query image, is a key capability for autonomous navigation and map**. Although there has been a considerable amount of research in the topic, the high degree of image variability (with viewpoint, illumination or occlusions for example) makes it a research challenge. One of the pa… ▽ More Single-view place recognition, that we can define as finding an image that corresponds to the same place as a given query image, is a key capability for autonomous navigation and map**. Although there has been a considerable amount of research in the topic, the high degree of image variability (with viewpoint, illumination or occlusions for example) makes it a research challenge. One of the particular challenges, that we address in this work, is weather variation. Seasonal changes can produce drastic appearance changes, that classic low-level features do not model properly. Our contributions in this paper are twofold. First we pre-process and propose a partition for the Nordland dataset, frequently used for place recognition research without consensus on the partitions. And second, we evaluate several neural network architectures such as pre-trained, siamese and triplet for this problem. Our best results outperform the state of the art of the field. A video showing our results can be found in https://youtu.be/VrlxsYZoHDM. The partitioned version of the Nordland dataset at http://webdiis.unizar.es/~jmfacil/pr-nordland/. △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: Accepted at 10th Planning, Perception and Navigation for Intelligent Vehicles (PPNIV'18), Workshop at IROS 2018

arXiv:1806.05620 [pdf, other]

doi 10.1109/LRA.2018.2860039

DynaSLAM: Tracking, Map** and Inpainting in Dynamic Scenes

Authors: Berta Bescos, José M. Fácil, Javier Civera, José Neira

Abstract: The assumption of scene rigidity is typical in SLAM algorithms. Such a strong assumption limits the use of most visual SLAM systems in populated real-world environments, which are the target of several relevant applications like service robotics or autonomous vehicles. In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object… ▽ More The assumption of scene rigidity is typical in SLAM algorithms. Such a strong assumption limits the use of most visual SLAM systems in populated real-world environments, which are the target of several relevant applications like service robotics or autonomous vehicles. In this paper we present DynaSLAM, a visual SLAM system that, building over ORB-SLAM2 [1], adds the capabilities of dynamic object detection and background inpainting. DynaSLAM is robust in dynamic scenarios for monocular, stereo and RGB-D configurations. We are capable of detecting the moving objects either by multi-view geometry, deep learning or both. Having a static map of the scene allows inpainting the frame background that has been occluded by such dynamic objects. We evaluate our system in public monocular, stereo and RGB-D datasets. We study the impact of several accuracy/speed trade-offs to assess the limits of the proposed methodology. DynaSLAM outperforms the accuracy of standard visual SLAM baselines in highly dynamic scenarios. And it also estimates a map of the static parts of the scene, which is a must for long-term applications in real-world environments. △ Less

Submitted 15 August, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: This work has been accepted at IEEE Robotics and Automation Letters, and will be presented at the IEEE Conference on Intelligent Robots and Systems 2018

Journal ref: IEEE Robotics and Automation Letters ( Volume: 3, Issue: 4, Oct. 2018 )

arXiv:1611.07245 [pdf, other]

doi 10.1109/LRA.2017.2715400

Single-View and Multi-View Depth Fusion

Authors: José M. Fácil, Alejo Concha, Luis Montesano, Javier Civera

Abstract: Dense and accurate 3D map** from a monocular sequence is a key technology for several applications and still an open research area. This paper leverages recent results on single-view CNN-based depth estimation and fuses them with multi-view depth estimation. Both approaches present complementary strengths. Multi-view depth is highly accurate but only in high-texture areas and high-parallax cases… ▽ More Dense and accurate 3D map** from a monocular sequence is a key technology for several applications and still an open research area. This paper leverages recent results on single-view CNN-based depth estimation and fuses them with multi-view depth estimation. Both approaches present complementary strengths. Multi-view depth is highly accurate but only in high-texture areas and high-parallax cases. Single-view depth captures the local structure of mid-level regions, including texture-less areas, but the estimated depth lacks global coherence. The single and multi-view fusion we propose is challenging in several aspects. First, both depths are related by a deformation that depends on the image content. Second, the selection of multi-view points of high accuracy might be difficult for low-parallax configurations. We present contributions for both problems. Our results in the public datasets of NYUv2 and TUM shows that our algorithm outperforms the individual single and multi-view approaches. A video showing the key aspects of map** in our Single and Multi-view depth proposal is available at https://youtu.be/ipc5HukTb4k △ Less

Submitted 27 June, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

Comments: Accepted for publication in IEEE Robotics and Automation Letters

Showing 1–8 of 8 results for author: Fácil, J M