-
3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration
Authors:
Quentin Herau,
Moussab Bennehar,
Arthur Moreau,
Nathan Piasco,
Luis Roldao,
Dzmitry Tsishkou,
Cyrille Migniot,
Pascal Vasseur,
Cédric Demonceaux
Abstract:
Reliable multimodal sensor fusion algorithms require accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high computational overhead caused by the large number of sampled points required for volume rendering. With…
▽ More
Reliable multimodal sensor fusion algorithms require accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high computational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new rendering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Authors:
Hiba Dahmani,
Moussab Bennehar,
Nathan Piasco,
Luis Roldao,
Dzmitry Tsishkou
Abstract:
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric…
▽ More
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency.
△ Less
Submitted 5 April, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution
Authors:
Hala Djeghim,
Nathan Piasco,
Moussab Bennehar,
Luis Roldão,
Dzmitry Tsishkou,
Désiré Sidibé
Abstract:
Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle suc…
▽ More
Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such issues, we present SCILLA, a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. SCILLA's hybrid architecture models two separate implicit fields: one for the volumetric density and another for the signed distance to the surface. To accurately represent urban outdoor scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios while being two times faster to train compared to previous state-of-the-art solutions.
△ Less
Submitted 5 April, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Authors:
Thang-Anh-Quan Nguyen,
Luis Roldão,
Nathan Piasco,
Moussab Bennehar,
Dzmitry Tsishkou
Abstract:
The task of separating dynamic objects from static environments using NeRFs has been widely studied in recent years. However, capturing large-scale scenes still poses a challenge due to their complex geometric structures and unconstrained dynamics. Without the help of 3D motion cues, previous methods often require simplified setups with slow camera motion and only a few/single dynamic actors, lead…
▽ More
The task of separating dynamic objects from static environments using NeRFs has been widely studied in recent years. However, capturing large-scale scenes still poses a challenge due to their complex geometric structures and unconstrained dynamics. Without the help of 3D motion cues, previous methods often require simplified setups with slow camera motion and only a few/single dynamic actors, leading to suboptimal solutions in most urban setups. To overcome such limitations, we present RoDUS, a pipeline for decomposing static and dynamic elements in urban scenes, with thoughtfully separated NeRF models for moving and non-moving components. Our approach utilizes a robust kernel-based initialization coupled with 4D semantic information to selectively guide the learning process. This strategy enables accurate capturing of the dynamics in the scene, resulting in reduced artifacts caused by NeRF on background reconstruction, all by using self-supervision. Notably, experimental evaluations on KITTI-360 and Pandaset datasets demonstrate the effectiveness of our method in decomposing challenging urban scenes into precise static and dynamic components.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Authors:
Quentin Herau,
Nathan Piasco,
Moussab Bennehar,
Luis Roldão,
Dzmitry Tsishkou,
Cyrille Migniot,
Pascal Vasseur,
Cédric Demonceaux
Abstract:
In rapidly-evolving domains such as autonomous driving, the use of multiple sensors with different modalities is crucial to ensure high operational precision and stability. To correctly exploit the provided information by each sensor in a single common frame, it is essential for these sensors to be accurately calibrated. In this paper, we leverage the ability of Neural Radiance Fields (NeRF) to re…
▽ More
In rapidly-evolving domains such as autonomous driving, the use of multiple sensors with different modalities is crucial to ensure high operational precision and stability. To correctly exploit the provided information by each sensor in a single common frame, it is essential for these sensors to be accurately calibrated. In this paper, we leverage the ability of Neural Radiance Fields (NeRF) to represent different sensors modalities in a common volumetric representation to achieve robust and accurate spatio-temporal sensor calibration. By designing a partitioning approach based on the visible part of the scene for each sensor, we formulate the calibration problem using only the overlap** areas. This strategy results in a more robust and accurate calibration that is less prone to failure. We demonstrate that our approach works on outdoor urban scenes by validating it on multiple established driving datasets. Results show that our method is able to get better accuracy and robustness compared to existing methods.
△ Less
Submitted 27 March, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
PlaNeRF: SVD Unsupervised 3D Plane Regularization for NeRF Large-Scale Scene Reconstruction
Authors:
Fusang Wang,
Arnaud Louys,
Nathan Piasco,
Moussab Bennehar,
Luis Roldão,
Dzmitry Tsishkou
Abstract:
Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images and camera poses for Novel View Synthesis (NVS). Although NeRF can produce photorealistic results, it often suffers from overfitting to training views, leading to poor geometry reconstruction, especially in low-texture areas. This limitation restricts many important applications which require accurate geometry, such as ext…
▽ More
Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images and camera poses for Novel View Synthesis (NVS). Although NeRF can produce photorealistic results, it often suffers from overfitting to training views, leading to poor geometry reconstruction, especially in low-texture areas. This limitation restricts many important applications which require accurate geometry, such as extrapolated NVS, HD map** and scene editing. To address this limitation, we propose a new method to improve NeRF's 3D structure using only RGB images and semantic maps. Our approach introduces a novel plane regularization based on Singular Value Decomposition (SVD), that does not rely on any geometric prior. In addition, we leverage the Structural Similarity Index Measure (SSIM) in our loss design to properly initialize the volumetric representation of NeRF. Quantitative and qualitative results show that our method outperforms popular regularization approaches in accurate geometry reconstruction for large-scale outdoor scenes and achieves SoTA rendering quality on the KITTI-360 NVS benchmark.
△ Less
Submitted 5 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
MOISST: Multimodal Optimization of Implicit Scene for SpatioTemporal calibration
Authors:
Quentin Herau,
Nathan Piasco,
Moussab Bennehar,
Luis Roldão,
Dzmitry Tsishkou,
Cyrille Migniot,
Pascal Vasseur,
Cédric Demonceaux
Abstract:
With the recent advances in autonomous driving and the decreasing cost of LiDARs, the use of multimodal sensor systems is on the rise. However, in order to make use of the information provided by a variety of complimentary sensors, it is necessary to accurately calibrate them. We take advantage of recent advances in computer graphics and implicit volumetric scene representation to tackle the probl…
▽ More
With the recent advances in autonomous driving and the decreasing cost of LiDARs, the use of multimodal sensor systems is on the rise. However, in order to make use of the information provided by a variety of complimentary sensors, it is necessary to accurately calibrate them. We take advantage of recent advances in computer graphics and implicit volumetric scene representation to tackle the problem of multi-sensor spatial and temporal calibration. Thanks to a new formulation of the Neural Radiance Field (NeRF) optimization, we are able to jointly optimize calibration parameters along with scene representation based on radiometric and geometric measurements. Our method enables accurate and robust calibration from data captured in uncontrolled and unstructured urban environments, making our solution more scalable than existing calibration solutions. We demonstrate the accuracy and robustness of our method in urban scenes typically encountered in autonomous driving scenarios.
△ Less
Submitted 21 July, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
3D Semantic Scene Completion: a Survey
Authors:
Luis Roldao,
Raoul de Charette,
Anne Verroust-Blondet
Abstract:
Semantic Scene Completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input. In the last years following the multiplication of large-scale 3D datasets, SSC has gained significant momentum in the research community because it holds unresolved challenges. Specifically, SSC lies in the ambiguous completion of large unobserved areas and the we…
▽ More
Semantic Scene Completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input. In the last years following the multiplication of large-scale 3D datasets, SSC has gained significant momentum in the research community because it holds unresolved challenges. Specifically, SSC lies in the ambiguous completion of large unobserved areas and the weak supervision signal of the ground truth. This led to a substantially increasing number of papers on the matter. This survey aims to identify, compare and analyze the techniques providing a critical analysis of the SSC literature on both methods and datasets. Throughout the paper, we provide an in-depth analysis of the existing works covering all choices made by the authors while highlighting the remaining avenues of research. SSC performance of the SoA on the most popular datasets is also evaluated and analyzed.
△ Less
Submitted 12 July, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
LMSCNet: Lightweight Multiscale 3D Semantic Completion
Authors:
Luis Roldão,
Raoul de Charette,
Anne Verroust-Blondet
Abstract:
We introduce a new approach for multiscale 3Dsemantic scene completion from voxelized sparse 3D LiDAR scans. As opposed to the literature, we use a 2D UNet backbone with comprehensive multiscale skip connections to enhance feature flow, along with 3D segmentation heads. On the SemanticKITTI benchmark, our method performs on par on semantic completion and better on occupancy completion than all oth…
▽ More
We introduce a new approach for multiscale 3Dsemantic scene completion from voxelized sparse 3D LiDAR scans. As opposed to the literature, we use a 2D UNet backbone with comprehensive multiscale skip connections to enhance feature flow, along with 3D segmentation heads. On the SemanticKITTI benchmark, our method performs on par on semantic completion and better on occupancy completion than all other published methods -- while being significantly lighter and faster. As such it provides a great performance/speed trade-off for mobile-robotics applications. The ablation studies demonstrate our method is robust to lower density inputs, and that it enables very high speed semantic completion at the coarsest level. Our code is available at https://github.com/cv-rits/LMSCNet.
△ Less
Submitted 25 October, 2020; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Description and Technical specification of Cybernetic Transportation Systems: an urban transportation concept
Authors:
Luis Roldão,
Joshue Pérez,
David González,
and Vicente Milanés
Abstract:
The Cybernetic Transportation Systems (CTS) is an urban mobility concept based on two ideas: the car sharing and the automation of dedicated systems with door-to-door capabilities. In the last decade, many European projects have been developed in this context, where some of the most important are: Cybercars, Cybercars2, CyberMove, CyberC3 and CityMobil. Different companies have developed a first f…
▽ More
The Cybernetic Transportation Systems (CTS) is an urban mobility concept based on two ideas: the car sharing and the automation of dedicated systems with door-to-door capabilities. In the last decade, many European projects have been developed in this context, where some of the most important are: Cybercars, Cybercars2, CyberMove, CyberC3 and CityMobil. Different companies have developed a first fleet of CTSs in collaboration with research centers around Europe, Asia and America. Considering these previous works, the FP7 project CityMobil2 is on progress since 2012. Its goal is to solve some of the limitations found so far, including the definition of the legal framework for autonomous vehicles on urban environment. This work describes the different improvements, adaptation and instrumentation of the CTS prototypes involved in European cities. Results show tests in our facilities at INRIA-Rocquencourt (France) and the first showcase at León (Spain)
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
3D Surface Reconstruction from Voxel-based Lidar Data
Authors:
Luis Roldão,
Raoul de Charette,
Anne Verroust-Blondet
Abstract:
To achieve fully autonomous navigation, vehicles need to compute an accurate model of their direct surrounding. In this paper, a 3D surface reconstruction algorithm from heterogeneous density 3D data is presented. The proposed method is based on a TSDF voxel-based representation, where an adaptive neighborhood kernel sourced on a Gaussian confidence evaluation is introduced. This enables to keep a…
▽ More
To achieve fully autonomous navigation, vehicles need to compute an accurate model of their direct surrounding. In this paper, a 3D surface reconstruction algorithm from heterogeneous density 3D data is presented. The proposed method is based on a TSDF voxel-based representation, where an adaptive neighborhood kernel sourced on a Gaussian confidence evaluation is introduced. This enables to keep a good trade-off between the density of the reconstructed mesh and its accuracy. Experimental evaluations carried on both synthetic (CARLA) and real (KITTI) 3D data show a good performance compared to a state of the art method used for surface reconstruction.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps
Authors:
B Ravi Kiran,
Luis Roldão,
Benat Irastorza,
Renzo Verastegui,
Sebastian Suss,
Senthil Yogamani,
Victor Talpaert,
Alexandre Lepoutre,
Guillaume Trehard
Abstract:
Lidar has become an essential sensor for autonomous driving as it provides reliable depth estimation. Lidar is also the primary sensor used in building 3D maps which can be used even in the case of low-cost systems which do not use Lidar. Computation on Lidar point clouds is intensive as it requires processing of millions of points per second. Additionally there are many subsequent tasks such as c…
▽ More
Lidar has become an essential sensor for autonomous driving as it provides reliable depth estimation. Lidar is also the primary sensor used in building 3D maps which can be used even in the case of low-cost systems which do not use Lidar. Computation on Lidar point clouds is intensive as it requires processing of millions of points per second. Additionally there are many subsequent tasks such as clustering, detection, tracking and classification which makes real-time execution challenging. In this paper, we discuss real-time dynamic object detection algorithms which leverages previously mapped Lidar point clouds to reduce processing. The prior 3D maps provide a static background model and we formulate dynamic object detection as a background subtraction problem. Computation and modeling challenges in the map** and online execution pipeline are described. We propose a rejection cascade architecture to subtract road regions and other 3D regions separately. We implemented an initial version of our proposed algorithm and evaluated the accuracy on CARLA simulator.
△ Less
Submitted 5 July, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.
-
A Statistical Update of Grid Representations from Range Sensors
Authors:
Luis Roldão,
Raoul De Charette,
Anne Verroust-Blondet
Abstract:
In a wide range of robotic applications, being able to create a 3D model of the surrounding environment is a key feature for autonomous tasks. In this research report, we present a statistical model to perform 3D reconstructions of the environment from range sensors using an occupancy grid. To do so, we take into account all the available information obtained from the sensor, considering the dista…
▽ More
In a wide range of robotic applications, being able to create a 3D model of the surrounding environment is a key feature for autonomous tasks. In this research report, we present a statistical model to perform 3D reconstructions of the environment from range sensors using an occupancy grid. To do so, we take into account all the available information obtained from the sensor, considering the distances traversed by the rays in each cell and seeking to reduce reconstruction errors caused by discretization. The approach has been validated qualitatively using the KITTI dataset.
△ Less
Submitted 4 July, 2019; v1 submitted 23 July, 2018;
originally announced July 2018.